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The Journal of Consulting Psychology will 
accept Brief Reports of research studies in 
clinical psychology for early publication with- 
out expense to the author. The procedure is 
intended to permit the publication of soundly 
designed studies of specialized interest or lim- 
ited importance which cannot now be ac- 
cepted because of lack of space. Several pages 
in each issue will be devoted to Brief Reports, 
published in the order of their receipt with- 
out respect to the dates of receipt of the regu- 
lar articles. Most Brief Reports appear in the 
first or second issue to go to press following 
their final acceptance. 


An author who wishes to submit a Brief 
Report: 


1. Sends the Brief Report, limited to one printed 
page and prepared according to the specifications 
given below. 

2. Also sends to the Editor a full report of the re- 
search study, in sufficient detail to give a clear ac- 
count of its background, procedure, results, and con- 
clusions, which will be filed with the American 
Documentation Institute to insure indefinite avail- 
ability. 


3. Prepares at least 100 mimeographed copies of 
the full report, which the author will send without 
charge to all who request it as long as the supply 
lasts. 


4. Agrees not to submit the full report to another 
journal of general circulation. 


Specifications 


Brief Report. The Brief Report should give 
a clear, condensed summary of the procedure 
of the study and as full an account of the re- 
sults as space permits. 


Brief Reports 


To insure that the Brief Report will be no 
longer than one printed page, its typescript, 
including all matter except the title and the 
author’s lines, must not exceed 70 lines av- 
eraging 42 characters and spaces in length. 
Set the typewriter margins for short lines of 
42 characters, which are 3.5 inches long in 
elite typing, and 4.2 inches long in pica. 

The manuscript of the Brief Report must 
be double spaced throughout. Except for its 
short lines, it follows the standard style (1). 
Headings, tables, and references are avoided 
or, if essential, must be counted in the 70 
lines. Each Brief Report must be accom- 
panied by a footnote in the style below, 
which is typed on a separate sheet and not 
counted in the 70-line quota: * 


1An extended report of this study may be ob- 
tained without charge from John Doe, 300 Market 
St., Prospect 6, Mass. (giving the author’s full name 
and address), or for a fee from the American Docu- 
mentation Institute. Order Document No. —— from 
ADI Auxiliary Publications Project, Photoduplica- 
tion Service, Library of Congress, Washington 25, 
D. C., remitting in advance $—— for microfilm or 
$—— for photocopies. Make checks payable to Chief, 
Photoduplication Service, Library of Congress. 


Extended report. The full report is pre- 
pared in the style specified by the Publica- 
tion Manual (1), except that it may be typed 
with single spacing for economy in photo- 
duplication by the ADI. 
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A Reanalysis of Beck's “Six Schizophrenias”’ 


John J. Conger, William L. Sawrey 


University of Colorado School of Medicine 


and Leonard F. Krause 


University of Denver 


The variations in symptomatology, previous 
history, and ultimate prognosis of patients 
diagnosed as schizophrenic have led a num- 
ber of investigators to question whether the 
term itself has any fixed meaning. Fenichel, 
for example, remarks “Occasionally it has 
been doubted whether a comprehensive ori- 
entation is possible at all and whether the di- 
verse schizophrenic phenomena actually have 
any thing in common. The label ‘schizo- 
phrenia’ is applied to so many things that it 
is not even of value for the purpose of prog- 
nosis. . . . Certainly, ‘schizophrenia’ is not a 
definite nosological entity, but rather em- 
braces a whole group of diseases” (2, p. 415). 

In order to lend schizophrenic diagnoses 
more precise connotations in terms of etiology, 
symptomatology, and prognosis, various in- 
vestigators have attempted to divide the over- 
all diagnosis of schizophrenia into subcate- 
gories. The number of differentiated subtypes 
with which they have emerged have ranged 
from the “process’—‘reactive’ dichotomy 
urged by some investigators, to the tradi- 
tional four subtypes of simple, hebephrenic, 
catatonic, and paranoid. None of these classi- 
fications, based in large measure on symptom- 
atology, has proved highly effective in prac- 
tice, either in terms of prediction or control 
(1). 

Recently, dissatisfaction with the usefulness 
of existing approaches led Beck and his asso- 
ciates to conduct an interdisciplinary study 
aimed at differentiating schizophrenic cases on 
an empirical basis—principally, as he says, in 

1 Based on a paper presented at the annual meet- 


ing of the Midwestern Psychological Association, 
Chicago, Illinois, 1955. 


terms of variations in “ego functioning.” A 
preliminary outline of possible variations in 
psychological defenses and ego functions was 
prepared by a number of psychiatrists under 
Grinker. This outline was then broken down 
by Stephenson into 120 separate items de- 
scriptive of personality functioning. Q sorts 
of these items were then made by psycholo- 
gists and psychiatrists for groups of outpa 
tients previously diagnosed as schizophrenic 
The psychiatrists (referred to as P in this 
study) sorted the items on the basis of clini- 
cal interviews with the patients, the psycholo- 
gists (referred to as A and B) on the basis 
of individual Rorschach protocols. Each pa- 
tient was rated by one psychiatrist and at 
least one, usually two, psychologists. All of 
the sorts for a group of patients were then 
intercorrelated and a factor analysis made of 
the resulting matrix. 


Three such analyses were made. Initially Beck and 
his co-workers began by rating 20 patients, 12 adults 
and 8 children. Since an inordinately large correla- 
tion matrix would have resulted had an attempt 
been made to factor-aralyze the total group, it was 
subdivided “into an ‘agreement’ group, i.e., one in 
which psychiatrists and Rorschach test investigators 
showed agreement in judging the patients; and a 
‘disagreement’ group” (1, p. 71). Apparently, factor 
analyses of the intercorrelations between raters with 
respect to individual patients were used as the basis 
for assignment to the agreement or disagreement 
group. Presumably when all the raters had similar 
factor loadings on a patient, he was assigned to the 
agreement group. When they did not, he was as- 
signed to the disagreement group. The exact degree 
of similarity required for assignment to the agree- 
ment group could not be ascertained from the in- 
formation available. However, real questions arise in 
this connection, since, as Shaffer points out, “.. . 
the psychiatrists and psychologists seemed in better 
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agreement on 3 of the 8 ‘disagreement’ cases than on 
3 of the 12 ‘agreement’ ones” (3, p. 472). At any 
rate, the division was made primarily for purposes 
of convenience in handling the data. In addition to 
these two groups, a third group of 20 schizophrenic 
children, some of them repeaters from the first two 
groups, was also studied. 


On the basis of these three analyses, Beck 
and his collaborators derived five factors. 
These factors, then, led them to define, “by 
their juxtaposition,” six types or classes of 
schizophrenic cases (1, p. 226). 


The Present Problem 


While we are not concerned here with the 
merits of Q technique, we question the meth- 
odology employed in obtaining Beck’s results, 
and feel that it casts serious doubts on the 
validity of his conclusions. In the case of 
each of the above analyses, the group correla- 
tion matrix consisted of X individual ratings 
of Y patients. Thus, each correlation coeffi- 
cient represents a combination of rater and 
patient characteristics. In other words, the 
amount of variance the correlation accounts 
for is distributed between the two in indeter- 
minate amounts. Therefore, when this corre- 
lation matrix is factor-analyzed it is virtually 
impossible to tell to what extent the factors 
represent raters or patients. 


Procedure 


Another method of analysis might separate 
the effects of raters and patients. Assume 
that from Beck’s over-all matrix we extract 
the individual intercorrelation matrices for 
the three raters, and factor them separately. 
If Beck’s factors are in fact due primarily to 


Table 1 
Rotated Factor Matrix, Rater A 











Subject Factor Factor Factor 
numbert I II Ill 
3 82 
5 77 
13 49 Al 
14 50 44 
8 74 
19 74 
4 Al 54 





Note.—Loadings with an absolute value under .40 are 
omitted. 
+ These numbers correspond to the numbers used by Beck. 


Table 2 
Rotated Factor Matrix, Rater B 

















Subject Factor Factor Factor 
numbert I It Itt 

3 86 

5 76 

13 0 .69 

14 45 .62 

1 87 

9 83 

11 80 

12 87 

15 65 

8 88 

19 77 

4 61 — 41 
Note.—Loadings with an absolute value under .40 are 

omitted. 


+ These numbers correspond to the numbers used by Beck. 


patient differences, we should then emerge 
with three quite similar factor patterns. This 
was done, in each case attempting to rotate 
toward simple structure in a fashion as simi- 
lar to Beck’s as possible. 


Results and Interpretation 


Agreement group. From his analysis of the 
over-all agreement group, Beck concluded 
that there emerged four factors, defining 
three schizophrenic types, S-1, S-2, and S-3. 
A somewhat different picture emerged when 
we analyzed each of the three raters’ matrices 
separately. Rater A’s rotated factor matrix is 
shown in Table 1. Assuming an absolute fac- 
tor loading of .40 or larger to be meaningful 
enough to help identify a factor, it is clear 
that individuals 8, 19, and 4 identify Factor 
I. Similarly, Factor II is identified by indi- 
viduals 3, 5, 13, 14, and 4, and Factor III by 
individuals 13 and 14. After extraction of 
these three factors, it can be seen that none 
of the remaining loadings are great enough 
to account for further factors. 

The corresponding matrix for Rater B is 
shown in Table 2. In this case, Factor I is 
identified by individuals 1, 9, 11, 12, 15, 8, 
19, and 4. Factor II is identified by indi- 
viduals 3, 5, 13, and 14. The only remaining 
factor, Factor III, is identified by individuals 
13, 14, and 4. 

Rater P’s matrix is shown in Table 3. Here 
Factor I is identified by individuals 1, 9, 11, 





Table 3 
Rotated Factor Matrix, Rater P 











Subject Factor Factor Factor Factor 
numbert I II Il IV 
3 74 
5 34 — .42 
13 65 45 
14 59 
1 69 — .63 
9 64 
11 61 
12 .68 
15 63 — .53 
8 72 
19 51 
+ 44 





Note.—Loadings 
omitted. 
T These numbers correspond to the numbers used by Beck. 


with an absolute value under .40 are 


12, 15, 8, and 4. Similarly Factor II is identi- 
fied by individuals 3, 5, 13, 15, and 19. Fac- 
tor III is identified by individuals 13 and 14. 
Finally, the remaining factor, Factor IV, is 
identified by individuals 5 and 1. 

How similar are these individual factor 
matrices? 

It should be immediately apparent that 
there exists some disagreement between the 
various raters as to the number of factors 
(and consequently the possible number of 
types of schizophrenia) present in this popu- 
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lation. Raters A and B, for example, emerge 
with only three, while Rater P finds four. 
Furthermore, there appear to be interrater 
differences in terms of the individuals identi- 
fied by apparently parallel factors. The corre- 
sponding factor patterns for each of the three 
raters are presented in Table 4. 

As may be seen, there is only fair agree- 
ment among all three raters in defining Fac- 
tor II. They all assign moderate or high load- 
ings on this factor to individuals 3, 5, and 
13, and negligible loadings to individual 8 
Furthermore, Raters B and P agree in assign- 
ing negligible loadings to individuals 1, 9, 11, 
and 12 (these individuals were not rated by 
Rater A); however, they disagree on indi- 
vidual 15. In addition, the three raters dis- 
agree on individuals 14, 19, and 4. Raters A 
and B, for example, assign a moderate posi- 
tive loading to individual 14, while Rater P 
gives him a negligible loading. While Raters 
A and B agree in assigning a negligible load- 
ing to individual 19, Rater P gives him a 
moderate positive loading. In the case of in- 
dividual 4, Raters B and P agree in assigning 
him a negligible loading, while Rater A gives 
him a moderate positive loading. 

On Factor I, there is somewhat better agree- 
ment than on Factor II. All three raters agree 
that individuals 3, 5, 13, and 14 have negli- 
gible loadings on this factor and that indi- 








Table 4 
Combined Factor Patterns of all Raters for the ““Agreement’’ Group 
Factor I Factor II Factor III Factor IV 
Subject —— —— 
numbert A B P A B P A B P P 
3 0 0 0 8 9 7 0 0 0 0 
5 0 0 0 & 8 § 0 0 0 -4 
13 0 0 0 5 6 7 4 7 5 0 
14 0 0 0 5 5 0 4 6 6 0 
1 ? 9 7 . 0 0 ° 0 0 —6 
9 ° 8 6 . 0 0 . 0 0 0 
11 . 8 6 ° 0 0 ° 0 0 0 
12 . 9 7 ° 0 0 ° 0 0 0 
15 ° 7 6 . 0 —5 ° 0 0 0 
8 7 ) 7 0 0 0 0 0 0 0 
19 7 8 0 0 0 5 0 0 0 0 
4 4 6 4+ 5 0 0 0 —4 0 0 








Note.—Digits in the above table indicate factor loadings expressed to a single significant figure and with the decimal points 


omitted. 
+ These numbers correspond to the numbers used by Beck 
* These subjects were not rated by A. 


Loadings below an absolute value of .40 are indicated as 0. 
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viduals 8 and 4 have moderate to high posi- 
tive loadings. In addition, Raters B and P 
agree in assigning positive loadings to indi- 
viduals 1, 9, 11, 12, and 15. In the case of 
individual 19, however, Raters A and B dis- 
agree strongly with Rater P. While A and B 
both assign him a high positive loading, P 
gives him a negligible loading. Whether there 
would have been additional agreement or dis- 
agreement if Rater A had rated all subjects, 
we do not know. 

On Factor III, all three raters agree in as- 
signing negligible loadings to individuals 3, 
5, 8, and 19, and moderate positive loadings 
to individuals 13 and 14. Raters B and P 
agree in assigning negligible loadings to indi- 
viduals 1, 9, 11, 12, and 15. However, while 
Raters A and P assign negligible loadings to 
individual 4, Rater B assigns him a moderate 
negative loading. 

When the above three factors are removed 
from the various factor matrices, the situation 
becomes more complicated. Raters A and B 
have, in effect, no Factor IV; that is, on no 
individual do they have any remaining fac- 
tor loadings above .40. Rater P has a fourth 
factor, identified negatively by individuals 5 
and 1. 

The above comparisons make it clear that 
there is disagreement between the various 
raters both as to the numbe: of factors pres- 
ent in this patient population, and also as 
to which persons identify apparently parallel 
factors. In his analysis of the over-all agree- 
ment intercorrelation matrix, Beck obtained 
four factors. Since in our analysis of the in- 
dividual rater’s matrices, Rater P is the only 
one having a fourth factor, it appears that 
Beck’s fourth factor was probably a result of 
disagreement among raters and confounding 
of raters and individuals. 

It is also possible to view Table 4 in terms 
of the amount of rater agreement as to the 
factors characterizing each patient. For ex- 
ample, it can be seen that all raters agree 
that individual 3 is characterized strongly by 
Factor II, and not by Factor I or Til. On 
the other hand, there is relatively poor rater 
agreement as to the factors characterizing in- 
dividual 4. Raters B and P agree that indi- 
vidual 4 has negligible loadings on Factor II. 
Rater A, however, gives him a moderate posi- 


tive loading on this factor. On Factor III, 
Raters A and P give this patient negligible 
loadings, while Rater B gives him a moderate 
negative loading. All agree that Factor I 
characterizes this patient. Rater P assigns 
individual 4 a negligible loading on his Fac- 
tor IV. 

Let us first consider the seven patients on 
whom ratings by all three judges were avail- 
able. Since three factors were extracted, there 
are 21 possible instances in which all three 
raters may agree. However, in only 15 of 
these instances does such agreement occur, 
even when all positive loadings above .40 are 
considered identical and when anything from 
— .39 to + .39 is considered to be zero in the 
table. For the remaining five patients who 
were rated by only two raters, there are 15 
possible instances of agreement. If we con- 
sider all positive loadings above .40 as identi- 
cal for our purposes, there are 14 instances 
of agreement. 

In our opinion, the limited agreement among 
raters shown in the above analyses does not 
justify Beck’s contention that clear-cut fac- 
tors, based on patient differences, emerge from 
his analysis of the “agreement” group. 

Disagreement group. It is somewhat diffi- 
cult to interpret Beck’s statements as to the 
factors which he feels emerge from an analy- 
sis of his disagreement group. At one point, 
he states that four factors emerge, but three 
of these are the same Factors II, III, and IV 


Table 5 


Combined Factor Patterns of all Raters for the 
“Disagreement” Group 











Factor Factor Factor 
I Il Ill 

Subject —— ——— --———— —— 
numbertf A B P A B P B 
2 et 0 0 0 0 
6 0 0 6 0 —7 0 0 
7 5 £c@ 0 0 5 —8 
10 6. £4 0 0 —6 
16 Se. -7 -—7 -—6 0 
17 0 5 —7 -8 0 0 
18 Ps Fg 0 0 —7 0 
20 2 G8 0 0 6 —5 





Note.—Digits in the above table indicate factor loadings 
expressed to a single significant figure and with the decimal 
points omitted. Loadings below an absolute value of .40 are 
indicated as 0. 

+ These numbers correspond to the numbers used by Beck. 
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of his agreement group. We are somewhat at 
a loss to determine how he arrived at this 
conclusion. Since he included in his disagree- 
ment rotated factor matrix three inputs la- 
beled S-1, S-2, and S-3, it would appear that 
these were somehow obtained from his agree- 
ment analysis, and included as reference vari- 
ables. However, if this were actually the case, 
these inputs should load up on different fac- 
tors in the disagreement analysis. Actually, 
however, they all load up on Beck’s disagree- 
ment Factor II. Because of these and similar 
problems, we do not feel capable of com- 
menting on Beck’s analysis and interpreta- 
tion of the disagreement group. 

However, we were again able to extract and 
analyze the individual rater’s intercorrelation 
matrices as in the case of the agreement 
group. From the results, the factor patterns 
shown in Table 5 were prepared. As may 
be seen, Rater A emerges with two factors, 
Rater B with three factors, and Rater P with 
two factors. It seems to us that these data 
emphasize the distortions that may result 
from analyzing an over-all intercorrelation 
matrix which confounds raters and patients. 
For while no individual rater emerges with 
more than three factors, Beck isolated four 
factors in his over-all analysis. It does not 
seem fruitful to investigate rater agreement 
further, since this group was explicitly se- 
lected on the basis of disagreement. However, 


one again wonders about Beck’s criteria for 
disagreement since all raters were in agree- 
ment as to the factor loadings of individuals 
2 and 16. 

The data from Beck’s analysis of his third 
group of 20 schizophrenic children are not 
available, and consequently cannot be com- 
mented upon. 


Summary 


In the present analysis we have demon- 
strated (a) that there is disagreement among 
raters as to the number of factors present in 
this population of patients, and (0) that 
there is also disagreement as to which per- 
sons identify apparently paralle! factors. 

In light of this evidence of interrater un- 
reliability, we do not feel that Beck is justi- 
fied in his conclusions as to the factors, and 
consequently the schizophrenic types which 
characterize his population. 


Received June 3, 1955. 


References 


1. Beck, S. J. The six schizophrenias. Res 
No. 6. New York 
1954. 

2. Fenichel, O. The psychoanalytic theory of neu- 
rosis. New York: Norton, 1945 

3. Shaffer, L. F. Review of S. J. Beck, The six 
schizophrenias. J Psychol., 1954, 18, 
472-473. 


Monogr. 
Amer. Orthopsychiat. Ass., 


consult 








Journal of Consulting Psychology 
Vol. 20, No. 2, 1956 


A Reply to the Reanalysis of Beck’s 
“Six Schizophrenias” 


William Stephenson 


Greenwich, Connecticut 


The paper by Conger, Sawrey, and Krause 
(2) reanalyzes data in Beck’s monograph 
(1) and questions the initial analysis on the 
ground that it involves a combination of rater 
and patient variances in unknown or unknow- 
able proportions. Their own alternative analy- 
sis results, in one case, in agreement being 
shown in 15 out of 21 instances, and in an- 
other case in 14 out of 15. They conclude 
that this is “limited” agreement, and evidence 
of “interrater unreliability.” Having regard to 
the complex matter involved, the present au- 
thor wishes that he could be as successful on 
the racecourse as .Beck’s colleagues were in 
the clinic. Other considerations suggest, too, 
that the sounder conclusion, even when the 
data are analyzed in a different way, is 
that considerable evidence is forthcoming for 
Beck’s hypotheses. 

The point at issue is as follows. The re- 
analysis brings into focus rater specificities 
rather than factors of more general range. If 
a matrix is factored for a particular Rater A, 
factors are apt to involve his specificities, due 
to his greater or lesser training and the like; 
and while it may be of interest to study such 
factors, they are unlikely to be of immediate 
interest or necessarily relevant *o one’s more 
general theoretical questions. Clearly, in this 
connection, the reanalysis involves itself only 
in certain factors, and not in their factor- 
arrays, without which Conger, Sawrey, and 
Krause are working completely in the dark. 
Beck, however, was working in the full day- 
light of his factor-arrays, and not with mere 
factors as such or the possibility of specifici- 
ties. His concern was with the general ques- 
tion as to whether different psychologists and 
psychiatrists, over a range of different pa- 
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tients, could provide factors of theoretical in- 
terest—the latter matters being judged by 
the factor-arrays at issue. The randomizing 
of raters and patients involves the sort of 
logic with which agronomists are familiar 
when, in their experiments, they divide a field 
into plots in such a way as to level out dif- 
ferences of fertility, shadow, and the like in 
different parts of the field. In the clinical case 
we seek to cancel out the specificities of either 
raters or patients, with the firm hope that 
generality is the more likely that way. 

The trouble, if I may say so, is that it is 
all too easy to slip into categorical mistakes 
in factor studies. To talk of “interrater un- 
reliability” is to involve oneself in such a 
mistake. It would have been quite easy to 
provide Beck and his colleagues with a Q 
sample which would result in one hundred 
per cent agreement about patients every time, 
but about quite unimportant matters. To 
reach a factor of real psychological interest 
(as judged by the factor-array), under the 
complex conditions of Beck’s Q sample, even 
if the loadings are relatively small and their 
deployment over different raters somewhat 
patchy, may be altogether far more important 
than to reach a factor with high loadings for 
everyone concerned. The latter factor would 
probably only deal with the obvious, the 
banal, or worse. With the Q sample used by 
Beck, involving as it does some complex theo- 
retical matters, the wonder is (as far as this 
author is concerned) that there was any 
agreement between psychologists themselves, 
and with the psychiatrists, rather than that 
the agreement is “limited.” 

The same kind of categorical slip is found, 
as well, in the critical remarks made by these 
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Table 1 


Intercorrelations and Factor Loadings for Two Patients 

















Patient 1 











Factor 

Rater 1 2 3 4 loadings 
1. Psychiatrist 16 16 8 .16 40 
2. Psychologist A 16 8 .16 40 
3. Psychologist B 16 40 
4. Psychologist C AO 

Patient 2 
Factor 


loadings 


Rater 1 2 3 4 I Il 
1. Psychiatrist .25 i 50 .00 
2. Psychologist A i 50 00 
3. Psychologist B 25 50 .00 
4. Psychologist C 50 0 





authors, and by Shaffer earlier (3), about the 
way in which Beck divided his 20 patients 
into an “agreement” and a “disagreement” 
group. These categories were determined not 
by size of correlation (a criterion that the 
critics appear to recommend) but by com- 
parability of factors. The examples in Table 1 
will make the difference clear. 

The average correlation for Patient 1 is 
smaller than for Patient 2 but the raters are 
in agreement about the factor at issue; in the 
other patient, however, though they are ap- 
parently in agreement about one factor, they 
differ about the other. It is easy to see that 
the situation in the latter case could be such 
that the raters could agree about Factor I by 
an amount less than 0.40, with Factor II cor- 
respondingly higher for all four raters. Every- 
thing will depend upon how these Patients 1 
and 2 appear in a table including other pa- 
tients as well. The assumption is reasonable, 
however, that matters are likely to be straight- 


forward for Patient 1, although, with other 
patients and raters in the matrix, a complex 
factor situation is still likely, whereas for Pa- 
tient 2 matters are already complicated. It 
was with such considerations in mind, then, 
and not the mere size of intercorrelations, 
that Beck’s patients were divided into two 
categories for convenience. 

It is well known that multiple factor condi- 
tions are permissive (4). An appropriate fac- 
tor methodology, therefore, is to use factor 
analysis to answer psychological questions, in 
a deductive rather than an inductive setting. 
In Beck’s case the concern was with the ques- 
tion whether ratings were, or were not, con- 
sistent with a certain theoretical position. The 
same data might be consistent with other 
theoretical positions as well, but this would 
be a matter for separate consideration. One 
should not think of such factor analysis, 
therefore, as a producer of facts for induc- 
tive regard (which is still another categorical 
mistake), but as a reproducer of facts, from 
a deductive standpoint. 

We see, then, in the short space of a single 
paper, several sources of categorical assump- 
tion, against which Q method is always di- 
rected. Beck’s initial analysis made no such 
mistakes, and is clearly on sound lines. 
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University of Colorado School of Medicine 


and Leonard F. Krause 
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While we do not wish to belabor discussion 
of the methodological differences existing be- 
tween Stephenson (4) and ourselves (2), we 
do feel that several specific points in his reply 
require comment. In the first place, Stephen- 
son remarks that even with our method of re- 
analyzing Beck’s data, we emerge, in the one 
case, with 15 out of 21 possible instances of 
rater agreement (when three raters were 
used) and, in another case, with 14 out of 15 
possible instances of agreement (when only 
two raters were used). Two points should be 
stressed in this connection. First, agreement 
of this magnitude occurs only when all posi- 
tive loadings above .40 are considered identi- 
cal and when anything from minus .39 to 
plus .39 is considered to be zero, i.e., also 
identical. Second, this amount of agreement 
obtains only for Beck’s “agreement” group, 
which was presumably preselected for inter- 
rater agreement. When the same criteria for 
rater agreement are applied to the two factors 
which all raters had in Beck’s “disagreement” 
group, agreement occurs in only 6 out of 16 
possible instances, as Table 5 of our paper 
(2) demonstrates. 

Stephenson remarks that he wishes he could 
do as well at the race track as Beck and his 
colleagues have done in the clinic. In this con- 
nection, we would like to make it clear that 
we have only admiration for the clinical skills 
displayed by Beck and his colleagues, and for 
their ingenuity in attempting to conceptualize 
the problem of schizophrenia in dynamically 
more meaningful terms. Our concern here is 
not a clinical, but solely a methodological one, 
having to do specifically with the manner in 
which the data were analyzed. Parenthetically, 


however, Stephenson’s race-track analogy does 
not impress us as particularly relevant, since 
in order to make visits to the race track ex- 
tremely profitable one need do only slightly 
better than chance. A similar degree of suc- 
cess in the clinic does not seem to be a par- 
ticularly desirable goal. 

However, as Stephenson himself points out, 
the essential point of difference between us 
does not concern the extent of interrater 
agreement in Beck’s data. It concerns, rather, 
the theoretical necessity for such agreement. 
In Stephenson’s words, “To talk of interrater 
unreliability” in discussing factor studies is 
to involve one’s self in a “categorical mis- 
take.”’ We do not agree. As we think our re- 
analysis demonstrates, replicable findings from 
factor analysis cannot be assured unless a 
reasonable degree of interrater reliability ex- 
ists at the outset. Stephenson argues that “if 
a matrix is factored for a particular Rater A, 
factors are apt to involve his specificities, due 
to his greater or lesser training and the like. 
. . .’ This may well be, and if so, generaliz- 
ing from this rater’s findings would certainly 
be misleading. That is why it is desirable 
to determine whether substantial agreement 
among a variety of raters can be obtained be- 
fore making such generalizations. 

But the problem of generality cannot, in 
our view, be solved simply by combining the 
ratings of all raters across all patients, re- 
gardless of degree of agreement, into one in- 
tercorrelation matrix, and then factoring it. 
Stephenson argues that from such a procedure 
one can derive factors of more “theoretical 
interest.” We are not entirely certain what 
Stephenson means by more theoretical inter- 
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est, but we are sure that one cannot be con- 
fident, with this procedure, of deriving fac- 
tors which, taken together in “factor-arrays,” 
define “six types of schizophrenia.” To take 
an extreme example, it would be possible to 
factor-analyze the intercorrelation matrix of 
the ratings of one patient by twenty raters. 
One might well obtain a number of factors 
from which factor arrays could be constructed. 
But we do not see how these factor arrays 
could be considered to define types of schizo- 
phrenia, though they might help to define 
types of raters. 

Of course, in their analysis, Beck and 
Stephenson used a number of patients as well 
as a number of raters, but i: seems to us that 
the argument still applies. To the extent that 
the raters disagree, the factors obtained will 
still confound raters and patients. Where there 
is a considerable degree of interrater disagree- 
ment (as was the case in their disagreement 
group), might not new raters differ from the 
present ones even in rating the same patients? 
If so, factoring their ratings might lead to 
different factors, and consequently to new 
sets of factor arrays, or “types of schizo- 
phrenia.” This, however, would be mani- 
festly unmeaningful, it seems to us, since both 
sets of schizophrenic types would have been 
derived from the same patients. 

Stephenson mentions several other exam- 
ples to indicate that we “are working in the 
dark.” He points out that our reanalysis “in- 
volves itself only in certain factors, and not 
in their factor arrays.” We fail to understand 
the dire implications of this for the degree of 
illumination in which we are working. We ex- 
plicitly recognized that Beck and Stephenson 
were working with “factor arrays,” though 
we referred to them as “factor patterns.” But 
since it is the factors themselves which led 
these investigators to define “by their juxta- 
position” the factor arrays or types of schizo- 
phrenia (1, p. 226), it seems to us only logi- 
cal that changes in the factors would be 
reflected in changes in the arrays. 

In connection with Stephenson’s reference 
to “the sort of logic with which agronomists 
are familiar,’ we would like only to note that 
agronomists in their work are aware not only 
of the desirability of randomizing, with which 
we also of course agree, but of the difficul- 
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ties in interpretation which may result from 
partial or complete confounding. This, we 
feel, is the primary problem in the present 
analysis. 

We are also charged with a “categorical 
slip,” as is Shaffer (3), in regard to the way 
in which Beck divided his patients into an 
“agreement” and a “disagreement” group. 
We are accused of failing to understand that 
these categories ““were determined not by size 
of correlation . . . but by comparability of 
factors.” 

We do not see how Stephenson could have 
arrived at this conclusion when we specifically 
stated, “Apparently, factor analyses of the 
intercorrelations between raters with respect 
to individual patients were used as a basis for 
assignment to the agreement or disagreement 
groups. Presumably when all the raters had 
similar factor loadings on a patient, he was 
assigned to the agreement group. When they 
did not he was assigned to the disagreement 
group” (1, pp. 2-3). We further stated that 
the exact degree of similarity required for as- 
signment to the agreement group could not 
be ascertained from the information available. 

We still feel this way, especially since in 
some instances in our analysis, the factor 
loadings were more comparable for the dis- 
agreement than for the agreement group. 
Consider, for example, the factor loadings of 
Individual 19 in Table 4 of our paper, and 
Individual 2 in Table 5. As these tables show, 
the loadings were less comparable for Indi- 
vidual 19, a member of the agreement group, 


Table 1 


Intercorrelations of Raters for One Patient of the 
Agreement Group and One Patient of 
the Disagreement’ Group 


Intercorrelations 


Patient and rater 1 2 3 


Patient 19 


1. Psychiatrist 01 20 
2. Psychologist A 42 
3. Psychologist B 

Patient 2 
1. Psychiatrist 24 32 
2. Psychologist*A 58 


3. Psychologist B 
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than for Individual 2, a member of the dis- 
agreement group. This does not seem surpris- 
ing in view of the obtained intercorrelation 
matrices of the three raters for these two pa- 
tients shown in Table 1 of the present paper 
(data concerning Psychologist C, the fourth 
rater, were not included in Beck’s monograph 
and consequently cannot be included below). 
Although we are still uncertain as to the exact 
criteria for assignment to the agreement and 
disagreement groups, our more fundamental 
question concerns the usefulness of a method 
which fails to take into account correlation 
size, resulting in categorizations such as the 
above. 

We would like to raise one final point. One 
of the reasons we did not proceed to analyze 
Beck’s disagreement group to the same ex- 
tent as the agreement group was that we 
could not understand his statement that in 
the disagreement group four factors emerged, 
but three of these are the same as Factors II, 
III, and IV of his agreement group. We could 
not see how this conclusion was reached. 
Since three inputs labelled S-1, S-2, and S-3 
were included in the disagreement-rotated 
factor matrix, presumably as reference vari- 
ables, it appeared to us that these were some- 
how obtained from the agreement analysis. 
If this were actually the case, however, these 
inputs should have loaded up on different fac- 
tors in the disagreement analysis. But since 
they all loaded up on Beck’s disagreement 


Factor II, we still do not understand, and 
Stephenson has not explained, the meaning 
of these inputs, or their relation, if any, to 
the agreement group factors. 

To summarize, despite other areas of dis- 
agreement between Stephenson and ourselves, 
our basic and apparently irreconcilable points 
of difference seem to involve differing notions 
as to the theoretical importance of interrater 
reliability and confounding in a study such as 
the present one. In Stephenson’s opinion, “to 
talk of interrater unreliability” is to involve 
one’s self in a “categorical mistake.” We sim- 
ply cannot agree. In our opinion, it is a mis- 
take to derive types of schizophrenia by fac- 
toring a matrix in which raters who disagree 
substantially with one another are confounded 
with patients. 
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The Effects of Perceptual Training on the 
Rorschach W and Z Scores’ 


Howard Leventhal 


University of North Carolina 


Although the Rorschach is now the most 
widely used instrument in American clinical 
psychology, it has little in the way of em- 
pirically established foundations. Though a 
number of Rorschach workers have recog- 
nized the need for normative data, there still 
remains the task of demonstrating the validity 
or meaningfulness of the hypotheses concern- 
ing the variables being standardized. Direct 
experimentation with specific factors of the 
test seems to have been neglected because 
experimentation, being analytical, is counter 
to the gestalt viewpoint held by many Ror- 
schach workers. Bergmann’s discussion of ge- 
stalt principles (4, pp. 450-451) makes clear, 
however, that the interrelationship of the 
Rorschach variables necessary to describing a 
unitary personality does not preclude experi- 
mentation with any one or any combination 
of variables to discover the laws of these sub- 
systems. 

The significance of the W response has been 
defined largely in terms of intellective char- 
acteristics. Rorschach (18, p. 59) wrote that 
“The number of W responses is to be con- 
sidered primarily an indicator of the energy 
of associative activity, dependent on a dis- 
positional set of the subject.” Beck stated that 
(3, p. 10) “The Rorschach test projects de- 
gree or height of intelligence in two factors- 
whole percepts (W) and organization (Z).” 
The Z score used by Beck (3, pp. 12-13) is 
considered “a more accurate representative of 
intelligence functioning per se,” as it considers 


1 This research was completed as a master’s thesis 
at the University of North Carolina. Thanks are ex- 
tended to Drs. W. Grant Dahlstrom, June E. Chance, 
and George S. Welsh for their invaluable advice and 
criticism. 
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intermediate values of organization missed by 
W. He feels that the Z scores “vary directly 
as the intelligence of S.” 


Previous Studies with W and Z 


If W and Z scores can be meaningfully in- 
terpreted as measures of intelligence, it should 
be possible to demonstrate both their validity 
and reliability. Any valid measure of intelli- 
gence should be reliable. Thus, W and Z, as 
indices of intelligence, should not be unduly 
influenced by extraneous events like small 
amounts of training or slight emotional and 
attitudinal shifts. Since consistent results have 
not been obtained from correlational studies 
investigating the relationship of Rorschach W 
and Z to intelligence, experimental research 
directed toward locating possible variables in- 
fluencing the variability of W and Z scores 
might provide clues as to why these scores do 
not correlate more highly with intelligence. 
Among the factors that influence W and Z re- 
sponses are: Stress has been found to reduce 
W significantly (7). Examiner differences and 
retesting produce significant changes in W% 
(15). W responses increase in a social situa- 
tion (14). As some set seems to be of im- 
portance, a number of experiments have at- 
tempted to study its influence by inducing 
various sets in subjects and measuring dif- 
ferences in responses. These sets have been 
manipulated by asking subjects to give “good” 
and “bad” records on different occasions. No 
changes in the record in W, and high intertest 
correlations, are reported in two instances (9, 
10), while low intertest correlations are re- 
ported in another (5). Instructions to pay at- 
tention to certain areas of the blot (12) pro- 
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duced significant changes in W, as did hearing 
at a lecture that intelligent people see W (1). 
Even looking at colored advertisements before 
taking the Rorschach increased the number 
of responses (17). Keyes (13), in a more 
direct attack on this problem, treated four 
groups as follows: 1, training on Street Ge- 
stalt figures plus instructions to see only W’s 
on the Rorschach; 2, training on the Street 
Gestalt test plus standard Rorschach instruc- 
tions; 3, no training but instructions to see 
W’s only; 4, no training but standard Ror- 
schach instructions. Group 1 produced sig- 
nificantly more W responses than any other 
group while neither training alone nor in- 
structions to give W only produced a signifi- 
cant increase in frequency of W’s. 


Nature of Rorschach Perceptual Processes 


Considering the Rorschach as a perceptual 
task, we might theorize about the nature of 
perceptual processes involved in making W 
responses, and variables that influence these 
processes. Perceiving a W response appears 
basically to be a problem of organizing the 
blot, while perception of subareas seems to 
involve ignoring the total gestalt and analyz- 
ing the card into new units: These two or- 
ganizing processes seem similar to perceptual 
factors found in Thurstone’s “Factorial Study 
of Perception” (20). Thurstone’s first factor, 
the A factor, seems to “represent the ability 
to form a perceptual closure against some dis- 
traction” (20, p. 101). The Street Gestalt test 
and the Gottschaldt figures have high load- 
ings on this factor. The latter test, however, 
also has a high loading on the Z factor, which 
Thurstone considers to represent “The ability 


to shake off one set in order to take a new 
one . . . it implies flexibility in manipulating 
several more or less irrelevant or conflicting 
gestalts” (20, p. 111). In contrast, the Street 
Gestalt Test does not have a high loading on 
the E factor. 


Problem 


The purpose of this study is to examine ex- 
perimentally whether the Thurstone A and E 
factors are related to W and Z responses to 
the Rorschach. This relationship can be dem- 
onstrated by contrasting Rorschach responses 
of groups pretrained on Closure figures (fac- 
tor A) with those trained on Gottschaldt fig- 
ures (factor Z). Responses of both trained 
groups will be compared to responses made 
by a control group receiving no pretraining. 


Hypotheses 


1. Training on a perceptual task before 
taking the Rorschach test will alter the W 
and Z responses of trained groups. 

2. The group trained on the incomplete fig- 
ures test (Thurstone’s A factor) will produce 
more W responses and receive higher organi- 
zation scores than a control group who did 
not receive perceptual training. 

3. The group trained on the Gottschaldt 
test (Thurstone’s E factor) will give fewer W 
responses and have a lower organization score 
than the control group. 


Procedure 


Forty-nine volunteer college males were 
used as subjects. The seating plan used dur- 
ing the Rorschach and the training periods 
was that of Harrower and Steiner (11, p. 


Table 1 


Experimental Design and Hypotheses 














Group N Training Rest Rorschach performance 
I 17 Closure practice using incom- 5’ More W responses and higher 
plete figures Z scores than Groups II and ITT 
II 13 Practice in breaking down Ge- - Fewer W responses and lower 
stalts and forming new units Z scores than Groups I and III 
on Gottschaldt figures 
Itt 19 No perceptual training, verbal 5’ W and Z scores intermediate to 


task 


those of Groups I and II 
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119). A 300 watt Viewlex projector placed 
about 21 feet from the screen gave a screen 
area of 60 by 40 inches for Card I. Subjects 
were told, previous to the experiment, that 
they were volunteering for two experiments, 
one conducted by a graduate student, and 
one by a faculty member, and that a maxi- 
mum of one and one-half hours would be 
needed to complete the work. They were as- 
signed to one of three groups depending on 
when they were free. After completion of the 
training period, subjects were reminded that 
they were also to participate in a second ex- 
periment and told that in the meantime they 
might take a 5-minute rest. These precautions 
were taken to prevent subjects from assuming 
that the instructions for the pretraining pe- 
riod were supposed to hold for the Rorschach. 
Different experimenters conducted the two 
sessions. 


Group I. Group I was trained on closure figures 
before being given the Rorschach. They were shown 
35, double-frame 35 mm. slides of closure figures 
from the Mooney Closure Test (16). Figures con- 
taining any irrelevant details were excluded, and for 
those with white figures on a black ground, the 
photographic negatives were used in the slide. The 
figures were projected for 20 seconds each, and sub- 
jects recorded their answers in a booklet. After the 
first presentation, the slides were again projected for 
33 seconds each. This time the experimenter gave the 
correct answers and helped subjects to see the figures. 
He emphasized that all segments must be used in 
order to see the figures. A 5-minute recess followed. 

Group II. Group II was trained on the Thurstone 
Concealed Figures Test (21), a multiple-choice modi- 
fication of the Gottschaldt figures. Figures were pro- 
jected with an opaque projector. A figure was pro- 
jected with four other more complex figures in which 
it might be hidden. Subjects were instructed to check 
all complex figures the smaller figure was hidden in. 
Two practice slides and 21 test items were used. All 
slides, except the first, were seen for 30 seconds. 
After the test, the items were again projected for 50 
seconds each and the correct answers pointed out 
Subjects were told that to see the enclosed figures, 
they must ignore the unity of the enclosing figure 
and search out the smaller figure. 

Group III. The control group was given the Ship- 
ley-Institute of Living Scale for Measuring Intel- 
lectual Impairment (19). Time limits of 10 minutes 
each were allowed for the vocabulary and abstrac- 
tions subtests. 

Rorschach administration. The Rorschach was ad- 
ministered to all groups by projecting 35 mm. double 
frame Kodachrome slides of the blots. The projec- 
tions were faithful reproductions of the blots. Each 
of the ten slides was presented for one and one-half 


minutes. Subjects were given answer booklets and 
asked to write down what they saw or anything 
that the slides might represent to them. During the 
inquiry, the subjects were given sheets of achromatic 
reproductions of the blots on which to encircle their 
responses. Examples of whole and detail responses 
were given; the slides were projected again during 
the inquiry. 


Results 


The first response made to each card was 
scored for W and for Z, following Beck’s (2) 
procedure. As some subjects did not respond 
to all of the cards, separate analyses were also 
made of W and Z for the subjects who gave 
more than ten responses to the Rorschach. 

All analyses of variance for W and Z, ex- 
cept for number of rejections, were statisti 
cally significant. Tests for homogeneity of 
variance showed nonsignificant differences 
(Table 2). As the direction of differences was 
predicted, one-tailed ¢ tests were used to com- 
pute the significance of differences between 
means. 

When compared with the Gottschaldt- 
trained group, the closure-trained group gave 
signicantly more W and Z responses. This 
was so when ¢ tests were computed using all 
subjects, or when subjects were omitted who 
gave less than 10 responses. The Gottschaldt- 
trained group also produced signicantly fewer 
W and Z responses than did the control group. 
As predicted, the closure-trained group gave 
more W and Z responses than the control 


Table 2 


Means, Sigmas, and Significance of Variance 
Ratio (F) for W and Z 


Group 
Measure I II Ill I p 
All subjects 
Mean W 6.21 431 5.65 3.45 .05 
Sigma W 207 1.75 2.18 
Mean Z 27.79 19.23 25.68 4.56 .05 
Sigma Z 8.00 7.11 8.66 
M Rej. 63 46 53 S,'>S,'* 
Subjects giving more 
than 10 responses 
Mean W 647 431 562 385 05 
Sigma W 2:20 1273 .225 
Mean Z 29.70 19.23 25.84 6.15 01 
746 7Al 8.91 


Sigma Z 
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Table 3 


Significance of Differences Between the Group Means for W and Z 

















Groups compared Diff. W t p* Diff. Z t p* df 
All subjects 
I-II 1.90 2.60 01 8.56 2.96 005 30 
I-III 56 83 25 2.11 79 25 34 
II-III 1.34 1.79 05 6.44 2.18 025 28 
Subjects giving more 
than 10 responses 
I-II 2.16 2.76 01 10.47 3.48 005 26 
I-III 84 1.14 15 3.86 1.35 10 29 
II-Il 1.32 1.71 OS 6.61 2.23 025 27 





* » less than value indicated with one-tailed test. 


group, but the differences were insignificant 
(Table 3). 


Discussion 


The random design of this study has cer- 
’ tain definite advantages over a test-retest de- 
sign where groups are equated by means of 
their pretest performance. If subjects had had 
experience with the Rorschach, or with any 
other inkblot test, the response set established 
during this previous test might act to reduce 
or to eliminate the effectiveness of the set es- 
tablished in the training period. Since W and 
Z are dependent upon R, data analysis was 
confined to first responses to each slide to 
control for number of responses. Other pos- 
sible methods of controlling R, matching on 
R after training and use of covariance tech- 
nique (8), did not seem appropriate as it was 
not known how the éxperimental treatments 
might effect responsiveness, and the regres- 
sion of W and Z on R could not a priori be 
expected to be linear. Asking subjects for an 
arbitrary number of responses was decided 
against, and Cronbach’s (6, p. 426) sugges- 
tion of scoring a fixed number of responses 
was followed. If training results in changes in 
organization value and in number of W re- 
sponses given, then it is reasonable to expect 
that the effects of such training would be as 
apparent on the first response to a card as on 
later responses. If training effects did not ap- 
pear in first responses, or if they were not 
demonstrable with the fairly small number of 
subjects used, then they would probably be 
of little practical importance. 





Exactly what was transferred from training 
to test period is unanswered in this study. 
Perhaps a way of responding had been 
learned, or a transfer of a set which tempo- 
rarily influences the responses made. We 
might presume that some implicit attitude to- 
ward what the subject thought it was that he 
was to do may have been transferred. How- 
ever, we can conclude from the results that 
when homogeneous groups with no previous 
experience with the Rorschach test are given 
different types of perceptual training for a 
short period of time, in a situation separate 
from the Rorschach itself, significant differ- 
ences result in their tendency to produce W 
responses and to give organized responses. 
Thus, the groups present rather different pic- 
tures in their Rorschach records. The Gott- 
schaldt group seems to possess a moderate 
amount of intelligence of a practical sort, 
while the Mooney group and the control 
group appear to possess a higher, more ab- 
stract type of intelligence. Apparently it was 
easier to eliminate tendencies to organize and 
to give W’s, than it was to increase this 
tendency. Perhaps training to increase the W 
and Z scores must be accompanied by in- 
structions to see W’s, or by an increasing 
motivation of subjects to give W’s by in- 
creasing their ego-involvement in the task. 

The facility with which the W and Z scores 
were altered indicates a need for caution in 
interpreting these scores in the light of hy- 
potheses advanced for them in Rorschach 
theory. Perhaps greater structuring of the 
test situation might lessen the influence of 
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extraneous factors on the protocol. However, 
Rorschach enthusiasts may defend the test 
on grounds that these changes in records are 
superficial and occurred in an unusual set- 
ting. They might claim that when a record is 
interpreted as a whole, by a skilled clinician, 
an accurate picture of the personality will 
emerge despite changes in a few scores. Clini- 
cal examination of the protocols, as well as 
the statistics of this study, impress one with 
the differences between the responses given 
by the various groups. 

If a test is very susceptible to situational 
influence, then for a clinician to use it and 
to make accurate predictions for a particular 
subject will require years of training at skills 
that are imperfect and never readily com- 
municated to others. Objective personality 
variables need to be more precisely defined 
and more accurately measured, with increased 
effort at reducing the intuitive aspects of the 
measuring instruments. Only by this refine- 
ment will better predictions of behavior be 
made available. 


Summary 


The purpose of this study was to investi- 
gate some variables which influence the or- 
ganization of Rorschach responses and re- 
sponse to whole blot areas. Forty-nine volun- 
teer college males were randomly assigned into 
one of three groups. Two groups received 
training on perceptual tasks, either the 
Mooney Closure Figures, or the Gottschaldt 
figures, before taking the Rorschach. The 
control group was administered the Shipley— 
Institute of Living Scale for Measuring Intel- 
lectual Impairment, a verbal and nonpercep- 
tual test, during its training period. Following 
the training and a short recess, another ex- 
perimenter administered the Rorschach test, 
which was introduced as a second experiment. 
Since W and Z scores depend on the number 
of responses given, length of record was con- 
trolled by analyzing only the first response 
made to each card. Four simple analyses of 
variance were made of W and Z scores. Values 
of F were statistically significant and tests for 
the significance of the differences between 
groups were made. 

1. The group trained on the Gottschaldt 


figures had significantly lower W and Z scores 
than the other two groups. 

2. While the Mooney-trained group’s W 
and Z scores were higher than the control 
groups, as predicted, these differences were 
statistically insignificant. 

So long as situational variables incidental 
to the testing situation can be suspected of 
creating a significant alteration in the W and 
Z scores, then the variance of these scores 
that is attributable to pervasive personality 
dimensions is in serious question. Thus, fur- 
ther caution must be exercised in the inter- 
pretation of Rorschach W and Z scores in 
light of the Rorschach hypotheses given for 
them. It is suggested that greater structuring 
of the test situation might decrease the tend- 
ency of W and Z scores to reflect transient 
sets. 
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Rorschach Summary Scores in Differential Diagnosis’ 


Irwin J. Knopf 


State University of Iowa * 


The discrepancy between reports of clinical 
experience and research data has been a seri- 
ous dilemma in Rorschach circles for a num- 
ber of years. On the one hand, many clinicians 
have been impressed with the usefulness and 
the validity of the Rorschach method in a 
range of applications, while, on the other, re- 
search findings have not in the main sup- 
ported this confidence. One such application 
has been the considerable use of the Ror- 
schach as an aid in differentiating psychiatric 
disorders. 

Formulated on the assumption that differ- 
ences in psychiatric conditions would be re- 
flected in Rorschach data, early investigations 
were primarily concerned with descriptive re- 
ports of the typical Rorschach performance 
of one or more psychiatric populations. As 
a result, statistical and/or experimental con- 
trols were not generally employed, but in- 
stead, clinical description was accepted with- 
out more rigorous verification. Later, some 
workers attempted to isolate the qualitative 
differences purported between patient groups 
and yet preserve the “holistic” nature of the 
test findings by deriving patterns or signs 
which collectively seemed to discriminate 
among nosological groups. Thus, a variety of 
signs were reported, for example, signs which 
were found in brain-damaged individuals, in 
psychoneurotics, in schizophrenics, and which 
were useful as indices of adjustment in the 
evaluation of psychotherapy (17, 18, 14, 8, 
22, 15). However, when many of these signs 
were employed in subsequent investigations, 


1A portion of this paper was presented at the 
American Psychological Association meetings in New 
York, 1954. 

2From the Dept. of Psychiatry, College of Medi- 
cine, and the Division of Clinical Psychology, Iowa 
Psychopathic Hospital. 
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the significant discriminations reported earlier 
were not corroborated (16, 3, 21, 7, 11). The 
fact that signs derived in the original investi- 
gations provided better discrimination within 
the initial sample than within subsequent 
populations is not too surprising. In some of 
these studies, signs were obtained by select- 
ing for prediction those few aspects of Ror- 
schach performance which showed the high- 
est relationships from among the available 
predictors which had a low correlation with 
the criterion. Such a procedure tends to capi- 
talize on chance fluctuations within the sam- 
ple, and consequently may result in a spu- 
riously high multiple-correlation coefficient. 
When signs which were derived in this man- 
ner are applied to subsequent populations, it 
can be expected that the coefficient will be of 
lower magnitude and of less predictive value 
than that obtained in the parent population. 

Other investigators have studied the extent 
to which Rorschach single summary scores 
can discriminate among psychiatric groups. 
Wittenborn and Holzberg (23) studied 39 
summary scores with five patient groups and 
found that one score (CF) significantly dis- 
criminated the manic from the depressed pa- 
tients. Friedman (6) evaluated the discrimi- 
native effectiveness of Rorschach scores with 
two groups of normal adults and 30 schizo- 
phrenics. He found eight Rorschach variables 
which significantly differentiated the schizo- 
phrenics from both groups of normal subjects. 
Reiman (19) evaluated 86 scores with repli- 
cated samples of ambulatory schizophrenics 
and neurotics. The results indicate that six 
scores were significant at the .10 level of con- 
fidence or better between the clinical groups 
for both samples. Kobler and Steil (12) re- 
ported no statistical differences of any con- 
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sequence between the Rorschach scores of the 
paranoid and depressive subgroups of involu- 
tional melancholic patients. 

While the findings with respect to single 
summary scores are predominantly negative, 
unequivocal evaluation of these data is diffi- 
cult. Most studies were able to obtain a few 
scores which discriminated among psychiatric 
groups. However, in some instances, the posi- 
tive findings could be expected by chance be- 
cause of the large number of tests of signifi- 
cance computed. It should also be noted that 
although a variety of Rorschach scores have 
been reported as differentiating diagnostic 
groups, there has not been a great deal of 
consistency for these scores to appear re- 
peatedly as diagnostic from study to study. 
In addition, certain methodological limita- 
tions such as small samples, incomplete sta- 
tistical or experimental controls, and vaguely 
defined diagnostic criteria have complicated 
the interpretation of the results. 

In the light of the ambiguous nature of the 
research findings, and the extensive applica- 
tion of the instrument, the need for a sys- 
tematic evaluation of the Rorschach as a psy- 
chodiagnostic technique seems indicated. An 
investigative program of this sort is under 
way at the Iowa Psychopathic Hospital. The 
present study represents one major phase of 
this research program, and it specifically deals 
with the problem of determining the extent 
to which Rorschach summary scores can dif- 
ferentiate psychiatric groups. 


Procedure 


Our subjects were selected on the basis of 
the following criteria: (a) chronological age 
of 15 years or older; (5) unanimous agree- 
ment among psychiatrists as to diagnosis both 
on admission to and discharge from the hos- 
pital; (c) diagnosis was restricted to psycho- 
neurosis (Pn), psychopathy (Pp), or schizo- 
phrenia (Sc); (d) diagnosis was independent 
of the Rorschach data,’ and (e) the number 
of Rorschach responses (R) would not con- 
tribute to a significant difference in the mean 


8It was not possible to obtain complete independ- 
ence of diagnosis and Rorschach data. However, this 
criterion was met to the extent that the initial im- 
pression and the admission staff diagnosis were made 
prior to the Rorschach administration. 


number or the variance of responses for the 
three groups. 

Initially, over 800 case records of patients 
who were 15 years or older; and who were 
given the Rorschach test during the six-year 
period from 1948 to 1953, were examined in 
order to check on the agreement and consist- 
ency of the psychiatric diagnosis. Each case 
folder contained an initial diagnostic impres- 
sion usually made by a psychiatric resident 
or a staff psychiatrist, an admission and staff 
diagnosis, and a final discharge staff diag- 
nosis, both of which were made by several 
residents and one or more staff psychiatrists. 
In this way 339 Rorschach records were ob- 
tained, and the number of responses per rec- 
ord was determined. Statistical tests showed 
no significant difference between the mean 
number of responses for the three groups, 
while the variances between the groups were 
significantly different. Inspection of the data 
indicated that two cases in the Pn group, each 
with 153 responses, were contributing greatly 
to the heterogeneity of variance. Consequently 
they were withdrawn from the sample and 
statistical tests of means and variances were 
recomputed. The groups showed no differ- 
ences in either the total number of responses 
or variances, so that the effects of R on other 
Rorschach scores for the three groups were 
considered approximately equal. 

A total of 337 Rorschach protocols meeting 
all the criteria and obtained from 131 Pn’s, 
106 Pp’s, and 100 Sc’s comprised the basic 
data for this study. In order to assure equal 
treatment of the data for all subjects, each 
protocol was rescored according to the scoring 
system described by Hertz (9).* The inci- 
dence of clinical types which were included 
within each of the three major diagnostic 
groups is presented in Table 1. Additional 
subject characteristics of the three groups are 
given in Table 2. From this it will be noted 
that there was an unequal number of males 
and females in each group, and that this was 
most discrepant in the Pp’s. The small dif- 
ferences in age between the groups were not 
statistically reliable, although the differences 
in education and length of hospitalization 

4The author wishes to express his appreciation 


to Donald Spangler for rescoring each Rorschach 
protocol. 
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Table 1 
Frequencies of Subclinical Types Within Each Major Diagnostic Group 
Psychoneurotics Psychopaths Schizophrenics 
Mixed 31 Psychopathic pers. 38 Paranoid 46 
Anxiety 30 Path. sexuality 18 Simple 13 
Psychasthenia 28 Path. emotionality 14 Hebephrenic 8 
Hysteria 22 Psychotic episodes 12 Mixed 7 
Neurasthenia 5 Neurotic traits 8 Catatoni } 
Impulse neurosis 5 Inadequate type 7 Acute 3 
Hypochondriasis 4 Asocial trends 3 Defect state 1 
Psychosomatic 2 Alcoholism 2 Unclassified 18 
Reactive depress 2 Paranoid tendencies 2 
Unclassified 2 Exhibitionism 1 
Malingering 1 
Total 131 106 100 


were significant at the .01 level of confidence. 
The educational level of the Sc’s was slightly 
higher than that of the Pn’s and Pp’s, while 
the length of hospitalization was slightly 
shorter for the Pp’s than for the other two 
groups. Data were also available with regard 
to the subject’s previous admission and ill- 
ness. Thirty-seven per cent of the Pn group 
had been ill prior to this admission, whereas 
24 per cent of the Pp’s and 23 per cent of the 
Sc’s had been ill previously. Generally these 
figures indicate that approximately 72 per 
cent of the total sample were first admissions 
at the time of the Rorschach administration, 
and that their condition was not, for the most 
part, considered chronic. 

Medians, means, and standard deviations 
were computed for each clinical group on the 
following summary scores: R, W%, D%, 
Dr%, S, F%o, F+% of 80, F+% of 60, 
M, FM, Fm, FC, CF, C, Fch, chF, ch, Fch’, 
ch'F, ch’, Fc, cF, c, AYo, Ad, H%, Hd, Na- 


ture, Blood, Sex, Anatomy, Object, Fire, Po- 
sition, Contamination, P, O%, and Rejection. 
Inspection of these figures indicated marked 
differences in the medians and the means for 
many of the scores, supporting previous ob- 
servations that most Rorschach scores are not 
distributed normally (4, 5). In addition al- 
most half of the scoring categories had me- 
dians of zero and means of less than 1.( 

which not only suggests that these scores oc- 
curred very rarely, but also that they have 
very limited utility in individual differential 
diagnosis. With the exception of the c, ch’, 
Position, and Contamination scores which oc- 
curred very infrequently, chi-square tests of 
independence were employed to evaluate the 
significance of differences between the groups 
for the remaining 34 summary scores. The 
over-all median derived from the total sam- 
ple (337) for each score was used as the em- 
pirical cutoff point. Yates’s correction for 
continuity was applied to the data wherever 


Table 2 








Psychoneurotics 
(56 M, 75 F) 


Mean SD 


Sex 
Age 27.3 8.6 
Educ. 11.7 2.5 
Length of hosp. 58.1 43.7 


(days) 





Subject Characteristics for the Three Groups (NW = 337) 


Psychopaths 
(79 M, 27 F) 


Schizophrenics 
(57 M, 43 F) 


Mean SD Mean SD 
26.9 98 27.7 9.6 
11.1 2.5 12.4 3.2 
40.5 37.5 55.8 45.3 
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Table 3 
Medians, Means, and Sigmas for the Larger Clinical Groups on the Significant Scores and the 


Probability V2!ues Obtained from Intergroup Comparisons 














Intergroup chi-square 

















Pn (N=131) Pp (N=106) Sc (N= 100) value probabilities* 
Score Mdn Mean SD Mdn Mean SD Mdn Mean SD Pn-Pp Pp-Se Pn-Sc 
Dr% 11.0 14.4 13.8 6.5 10.0 11.4 15.0 15.8 12.8 O1 
A% 50.0 48.0 17.4 51.0 50.2 18.0 440 43.4 17.0 O1 
FM 2.0 2.7 2.4 2.0 2.6 2.5 2.0 2.0 2.0 02 
P 5.0 5.1 2.2 5.0 5.2 2.5 4.0 4.1 2.0 01 02 
Sex 0.0 0.5 1.2 0.0 0.2 1.0 0.0 0.9 2.8 O01 Ol 
Anatomy 1.0 2.5 3.3 1.0 1.7 2.4 1.0 2.6 4.3 01 O1 
* Only probability values at or beyond the .05 levels are reported. 
cell frequencies were lower than ten (13). means, and standard deviations for these 


The null hypothesis was retained with those 
chi-square values which did not meet the 
minimum requirement of the .05 level of 
confidence. 


Results 


Although tests of significance were not 
computed, the incidence of occurrence of 
contaminated and position responses will be 
presented because some Rorschach workers 
have regarded these responses as diagnosti- 
cally important in that they are almost al- 
ways associated with schizophrenia or psy- 
chosis (1, 2, 10, 20). Our data, however, 
indicate that these responses can and do 
occur in other psychiatric conditions. For 
example, we found contaminated responses in 
9 Sc’s, 2 Pn’s, and 1 Pp, and position re- 
sponses in 2 Sc’s, 5 Pn’s, and 1 Pp. Moreover, 
when we consider that only 12 patients out 
of the total sample of 337 produced con- 
taminated responses and only 8 patients pro- 
duced position responses, it is apparent that 
diagnostic classification cannot be effectively 
made solely on the basis of the presence or 
absence of these responses. 

The over-all chi-square tests applied to 
each Rorschach score for the three clinical 
groups resulted in significant values for the 
following six scores: Dr%, A%, FM, P, Sex, 
and Anatomy. Three additional chi-square 
values were computed for each of these sig- 
nificant scores to determine more specifically 
which group or groups the scores discrimi- 
nated among. Table 3 lists the medians, 


scores, as well as the probability level ob- 
tained from the separate comparisons of the 
clinical groups. Most apparent from this table 
is the failure of any one score to discriminate 
among all three groups, although three scores 
were discriminative in two comparisons. It 
will also be noted that five scores significantly 
differentiated the Pp’s from the Sc’s, whereas 
only two scores differentiated the Pn’s from 
the Pp’s, and the Pn’s from the Sc’s. Inspec- 
tion of the data and reference to the medians 
and means listed in Table 3 indicates the di- 
rection of significance. There were more Pp’s 
who were lower on Dr%, and higher on FM 
than Sc’s; more Pn’s who were lower on Dr%, 
and higher on A% than Sc’s; more Pn’s who 
were higher on FM than Sc’s; more Pp’s and 
Pn’s who were higher on P than Sc’s; and 
more Pn’s and Sc’s who were higher on Sex 
and Anatomy responses than P’s. 

The F + % score is often regarded as im- 
portant in differentiating psychotic from non- 
psychotic subjects. However, the over-all me- 
dian value of 80 per cent which was obtained 
from the total sample did not discriminate 
among the clinical groups. Beck (2) has sug- 
gested 60 per cent as a diagnostically useful 
cutoff point, and consequently this score was 
employed with the present data. The chi- 
square values indicated significant differences 
between the Sc’s and the Pn’s, and the Pj’s 
and Pn’s at the .001 and .05 levels of con- 
fidence respectively, while no significant dif- 
ference was obtained between the Pp’s and 
the Sc’s. Recognizing that there are differ- 
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Table 4 
Medians, Means, and Sigmas for the Clinical Groups of 50 Cases on the Significant Scores and the 





Probability Values Obtained from Intergroup Comparisons 





Intergroup chi-square 











Pn (N=50) Pp (N=50) Sc (N=50) value probabilities* 
Score Mdn Mean SD Mdn Mean SD Mdn Mean SD Pn-Pp Pp-Sc Pn-Sc 

Dr% 8.0 12.7 13.0 7.0 9.9 11.1 140 165 13.2 01 05 
M 1.0 1.8 2.2 1.0 1.8 1.5 1.0 1.6 2.8 01 
Fch’ 1.0 1.1 1.5 0.0 0.5 1.3 0.0 0.5 1.0 01 05 
P 5.0 49 2.2 5.0 5.2 2.4 4.0 4.0 2.0 01 
Sex 0.0 0.6 1.6 0.0 0.1 04 0.0 1.0 2.5 02 
Anatomy 1.0 2.6 3.2 1.0 1.5 1.9 2.0 2.8 2.9 05 01 





* Only probability values at or beyond the .05 levels are reported. 


ences in the Hertz and Beck scoring systems 
which include F + tables and procedures for 
computing F + %, these findings nevertheless 
suggest that Beck’s empirical cutoff point of 
60 per cent may also be discriminative with 
Hertz’s scoring method. 

In evaluating the results, it seemed impor- 
tant to consider the extent to which chance 
factors could account for the significance of 
differences between the groups for the seven 
Rorschach scores. Having computed 34 over- 
all chi-square values, we can expect approxi- 
mately two to be significant merely by chance 
at the .05 level of confidence. In the light of 
this possibility, a more rigorous examination 
of the stability of the present findings seemed 
essential. Therefore, 150 cases, 50 from each 
clinical group, were randomly selected from 
the total parent sample of 337. Statistical 
treatment and the analysis of the data for 
this sample was the same as that previously 
described for the parent population. Chi- 
square values were obtained separately for 
the 34 Rorschach scores, and only those 
scores which were significant on both sam- 
ples were regarded as stable. 

The analysis of the data for the new sample 
revealed that the Dr%, M, Fch’, P, Sex, and 
Anatomy scores significantly differentiated the 
groups at the .05 level of confidence or better. 
The medians, means, and standard deviations 
for each score together with the probabiilty 
levels obtained from the separate comparisons 
of the groups are shown in Table 4. These re- 
sults indicate the Pp’s and the Sc’s were sig- 
nificantly differentiated by five scores, whereas 


only two scores differentiated the Pn’s from 
the Pp’s, and the Pn’s from the Sc’s. In com- 
paring these findings with those obtained from 
the parent sample, we find, that only the 
Dr%, P, Sex, and Anatomy scores signifi- 
cantly discriminated the clinical groups on 
both samples, and thus met the criterion of 
stability. While all four stable scores discrimi- 
nated the Pp’s from the Sc’s, these scores 
were less sensitive in differentiating the Pn’s 
from the Sc’s, and the Pn’s from the Pp’s in 
that only one score for each comparison was 
found to be significant with both samples 
(Dr% and Sex, respectively). 


Summary and Conclusions 


To determine the extent to which Ror- 
schach summary scores could discriminate 
among psychiatric populations, a total of 337 
carefully selected Rorschach records obtained 
from 131 Pn’s, 106 Pp’s, and 100 Sc’s were 
analyzed. Chi-square tests of independence 
were computed on 34 Rorschach summary 
scores for the total sample, and also for a 
second sample of 150 cases drawn randomly 
from the parent sample. Only scores which 
were discriminative on both samples were con- 
sidered stable. The results showed that: 

1. Most Rorschach summary scores are not 
normally distributed. 

2. Almost half of the scoring categories had 
medians of zero and means of less than 1.0, 
not only indicating the rareness of these re- 
sponses but also underscoring the limited 
utility of these scores for differential diag- 
nosis. 
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3. Contaminated and position responses can 
and do occur, albeit infrequently, in all three 
groups and cannot be regarded as pathogno- 
monic of psychosis or schizophrenia. 

4. On an over-all basis, four scores: Dr%, 
P, Sex, and Anatomy significantly discrimi- 
nated among the groups on both samples at 
or beyond the .05 level of confidence. 

5. When specific tests of significance were 
made, no single summary score significantly 
differentiated all three clinical groups. 

6. For practical purposes, Rorschach sum- 
mary scores cannot be regarded as effective 
in differentiating psychiatric groups. 


Received July 19, 1955. 
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Analysis of the Rorschach test by either 
the Klopfer or Beck scoring systems includes 
a comparison of the response productivity for 
those cards which are entirely chromatic (8- 
10) with the number of responses for the re- 
maining cards. This factor in the Klopfer sys- 
tem, the “8—9-—10 per cent” (8-10%), is the 
percentage contributed by Cards 8, 9, and 10 
to the total number of responses. The stand- 
ard for this factor is based on statistical esti- 
mations according to the assumption of D re- 
sponsiveness and the distribution of D areas 
in the ten cards which gives an expectancy of 
40 per cent of responses in the last three 
cards. In Beck’s system the comparison is an 
“affective ratio” (AR), obtained as a quotient 
of responses in Cards 8-10 divided by the re- 
sponse total in Cards 1-7. He reports a mean 
of 0.60 (SD 0.19) for a group of normal 
adults, and considers 0.40-to 0.80 as the nor- 
mal range. Beck looks on this factor as “a 
measure of the readiness to quicken to life’s 
pleasurable experiences” (3), and Klopfer 
treats it as indicating “a responsiveness to 
stimuli from without which is even less under 
the conscious control of the subject than the 
use of action and color elements” (7). Simi- 
lar significances are assigned to deviations of 
this factor—when the ratio is high, the sub- 
ject is volatile, liable to excitement, and 
stimulated by environmental impact; when 


1From the Veterans 
Omaha, Nebraska. 


Administration Hospital, 
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low, he is under-responsive to emotion-toned 
stimuli and inhibited or lacking in 
siveness to strong environmental impact 

The literature discloses several studies deal 
ing with the validity of color as the deter- 
minant of responsiveness in the 8-10% hy- 
pothesis. Sappenfield and Buker (9), Dub- 
rovner, Von Lackem, and Jost (5), Perlman 
(8), Buker and Williams (4), Allen 


and Rosenzweig (1), using various patient 


respon 


Crit 
SLI 


and normal groups, and all employing the 
technique of achromatic presentations, agreed 
in finding that the number of responses on 
8-9-10 is not a function of color. It was 


pointed out (8), however, that while the ex- 
perimental evidence indicates that color is not 
an active factor in the 8-10%, the possibility 
remains that there is validity in this factor as 
a personality variable. 

The purposes of the present study are, first 
to explore the validity of the 8-10% and the 
AR as a gross personality variable insofar as 
personality is implied in normal and psycho- 
diagnostic groups, and second, to supply some 
normative data for the 8-10% hypothesis 
and to supplement the norms for the affe 
tive ratio. 


Study 


The subject groups consisted of normal, 
neurotic, and schizophrenic adults, and nor- 
mal, behavior problem, and mentally defec- 
tive children. The neurotics and schizophren- 
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Table 1 
Subject Groups, Age, Response Total, AR, and 8-10% 
Age R AR 8-10% 
Group N Mean SD Mean SD Mean SD Mean SD 

Adults 

Nermal 181 39.31 10.89 28.36 16.24 650 .238 38.2 8.1 

Neurotic 70 32.96 6.26 22.67 ~=10.71 594 .292 354 10.5 

Schizophrenic 119 29.81 6.21 27.24 18.75 523 = .245 32.7 10.5 
Children 

Normal 196 12.97 3.14 27.20 12.29 643 248 37.8 8.9 

Feebleminded 39 12.95 2.09 14.77 3.34 .666 .299 38.4 9.3 

Behavior problem 49 12.37 3.02 20.67 10.14 567 177 35.4 7.0 


ics were VA hospital inpatients diagnosed by 
the psychiatric staff independent of considera- 
tion of the Rorschach test 8-10% or AR; 
the behavior problem group was composed of 
children referred to a psychological service by 
schools and courts for behavior disturbances; 
the mental defectives were institutionalized; 
the normal subjects were residents of a small 
midwestern community * located in the same 
geographical locale from which the diagnostic 
groups came. 

The Rorschach tests were individually ad- 
ministered and the protocols analyzed for 
computation of the 8-10% and affective 
ratio. No record with fewer than 10 responses 
was included. 


Results 


The subject groups, identifying information, 
and findings for the 8-10% and AR are pre- 
sented in Table 1. All groups, excepting the 
neurotic and schizophrenic, consistcd of both 


2 The records were taken from a larger study done 
under the supervision of Dr. Marshall R. Jones and 
supported by the University of Nebraska Research 
Council. 


males and females, and the data were first 
analyzed to determine sex differences within 
each group. Since statistical analysis revealed 
no conventionally-accepted significant differ- 
ence in the 8—-10% or AR, the data for the 
males and females were combined. 

The differences were treated for significance 
by means of the ¢ test and, for the adult 
groups, the normals and schizophrenics are 
differentiated significantly by both the AR 
and the 8—-10% (¢t = 4.44 and 4.84, respec- 
tively, both » < .001), and between the nor- 
mals and neurotics the difference on the 8— 
10% achieves a level of conventional signifi- 
cance (¢ = 2.00, p < .05). It was found, too, 
that the age differences are significant at less 
than .01 between ail three groups, and the 
differences in response total (R) are signifi- 
cant between the normals and neurotics and 
between the neurotics and schizophrenics. 
Thus, further refinement of the data was at- 
tempted by selecting 45 subjects in each of 
the three classifications for the purpose of 
matching them on mean age and response 
total. Results for these matched groups are 
presented in Table 2. 


Table 2 


Matched Adult Groups, Age, Response Total, 














Age {R 8-10% 
Group i ———— acter aa 
(N = 45) Mean SD Mean SD Mean SD Mean SD 
Normal 35.29 6.86 24.09 11.10 .649 .212 38.4 7.6 
Neurotic 34.53 5.76 24.07 11.62 .604 320 36.8 11.0 
Schizophrenic 34.02 4.95 24.16 13.72 A55 .223 29.9 9.4 
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Table 3 
Differences on AR and 8-10% for the Matched Group 
AR 8~-10% 
Groups t p t p 
Normal vs. Schizophrenic 4.22 <.001 4.66 <.001 
Neurotic vs. Schizophrenic 2.53 <.02 3.17 <.01 
Table 4 
Subjects, Cutoff Scores, Number, and Percentage of Cases for Both the AR and 8-10% 
Unmatched groups Matched groups 
No.cases No. cases 
AR .334 1R. = .334 
Group VY 810%325 ( V 810% 225 
Normal 181 7) 4.97 45 1 2.2 
Neurotic 70 10 14.29 45 6 13.3 
Schizophreni 119 33 27.73 45 16 35.55 


Analysis of these findings reveals that none 
of the differences between the normals and 
neurotics is statistically significant at the ac- 
ceptable levels of confidence. Significant dif- 
ferentiation can be made, however, between 
the normals and schizophrenics and between 
the neurotics and schizophrenics on both the 
AR and 8—10% measures, as listed in Table 3. 

For the children groups, inspection of 
Table 1 reveals that both the AR and 8-10% 
indices significantly differentiate the Behavior 
Problem group from both the Normals and 
Mental Defectives, but the differences be- 
tween the latter two groups are not statisti- 
cally significant. It is interesting to note the 
similarity of the mean ratios and their stand- 
ard deviations for the normal children and 
normal adults. Further analysis of the data 
for the children population has not been made. 

In the interest of the practical significance 
of these ratios, a cutoff point was sought 
which would afford the maximum differenti- 
ating effectiveness of the measures. Table 4 
presents the number and percentages of cases 
in each of the groups falling at or below an 
AR of .334 (or the equivalent 8-10%). 

All of these differences are significant, and 
it is felt that these findings support inter- 
pretation of the occurrence of a very low AR 


or 8-10% as contributing evidence of pa- 
thology of a schizophrenic or neurotic nature. 
No effective cutoff point could be made at the 
upper end of the ratio distribution since all 
three groups showed a similar scatter of a 
few cases within the upper range. This, it is 
felt, suggests that a high AR or 8-10% is 
not differentially useful. 


Discussion 


The results obtained show that the AR and 
8-10% do have some discriminating value for 
personality assessment at the level of broad 
psychodiagnostic classification. Confining at- 
tention to the adult populations, it is seen 
that the normals and schizophrenics differ 
significantly with respect to mean AR, and 
that the neurotics, more like normals than 
schizophrenics, are consistent in the direction 
of deviation from the other two groups. It 
would probably be agreed that there are per- 
sonality variables related to normality, neu- 
roticism, and schizophrenia which seem to 
vary on a continuum of intensity in this or- 
der. In terms of personality disturbance, the 
neurotics as a group, while heterogeneous, are 
characterized in comparison with normal in- 
dividuals by the presence of dysfunction, and 
schizophrenics as even more disordered. Simi- 
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larly, this view may be applied to the results 
obtained from the child populations in the 
sense that a group of behavior problem chil- 
dren differs from their normal peers by dis- 
turbances of personality, whereas perhaps the 
greatest difference between normal and men- 
tally defective children is more exclusively 
one of intellect than grosser personality func- 
tioning, so that these two groups tend to 
stand together in contrast with behavior 
problem disorders. 

It is felt that the studies which conclude 
that productivity on Cards 8-10 is not sig- 
nificantly influenced by the presence of color 
prevent interpretation of these ratios in terms 
of emotional characteristics of the psycho- 
diagnostic groups or of the traditionally-held 
relationship of emotional reactivity to color 
stimuli on the Rorschach test. Findings of this 
study, however, support Perlman’s suggestion 
that perhaps this factor is related to a per- 
sonality variable, and supplies the basis for 
further investigation along such lines. The 
data reported herein may be of use norma- 
tively for both the 8-10% and affective ratio. 


Summary 


The AR and 8-10% ratios were computed 
from the Rorschach protocols of 654 subjects, 
representing groups of normal, neurotic, and 
schizophrenic adults and normal, behavior 
problem, and mentally defective children. 
Significant mean differences between some of 
the groups are obtained which suggest a re- 
lationship between the ratios and gross per- 
sonality differences such as are implied in 
normal and psychodiagnostic groups. A cut- 
off point is suggested in the interest of the 
practical significance of a low ratio in dis- 


criminating normal individuals from those 
with neurotic or schizophrenic pathology. 
Findings from earlier studies are cited as the 
basis for caution in interpreting these ratios 
in terms of emotional reactivity to color 
stimuli on the Rorschach test. 

The present findings are offered as some 
normative data for the 8-10% and as sup- 
plemental norms for the affective ratio. 


Received June 27, 1955. 
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of Cognitive Material’ 


E. Robert Sinnett and Ruth Roberts 


Student Counseling Bureau, University of Minnesota 


The purpose of this study was to test hy- 
potheses about relationships between the way 
an individual organizes the Rorschach stimuli 
and the way he thinks in cognitive tasks. 

The manner in which a subject organizes 
his percepts on the Rorschach is commonly 
interpreted by authorities on the test (e.g., 
1, 8, 9) as yielding information about his 
cognitive processes. A preponderance of whole 
(W) responses indicates that the individual 
tends to generalize or think in an abstract 
manner. Emphasis on the usual details (D) 
of the blots signifies that the subject is at- 
tentive to the obvious, practical aspects of 
his environment, and seeing a large propor- 
tion of rare details (Dd) means that the per- 
ceiver may be pedantic and overly concerned 
with trivial aspects of his environment. Thus, 
the percentages of W, D, and Dd are used 
to make inferences about thought processes. 
These proportions are referred to collectively 
as the approach type (Ap). 


Method 


Subjects. The subjects were University of 
Minnesota students who were attending classes 
in the Educational Skills Clinic at the Stu- 
dent Counseling Bureau. Two samples were 
drawn: the initial group was composed of 43 
students, and the cross-validation group con- 
sisted of 69 subjects. Both samples were ex- 
haustive: all students attending classes dur- 
ing the fall quarter and the winter quarter of 


1 This study was supported by a research grant 
from the Graduate School of the University of Min- 
nesota. 

2We wish to express our appreciation to Mrs. 
Dorothy F. Nicholas for suggestions and help in the 
early phases of this study. 


the 1953-54 academic year participated in 
the experiment. 

Rorschach. A group Rorschach was ad- 
ministered to groups averaging about ten in 
size. The instructions for the free associ 
were paraphrased from Beck (1) and the in- 
quiry for the location of the responses is de- 
scribed by Harrower and Steiner (5). No in- 
quiry was performed in order to establish the 
determinants of the responses. 

Reading comprehension. After surveying 
the reading instruments available at the col- 
lege level, the paragraph reading comprehen- 
sion section (items 81-100) of the Diagnostic 
Reading Tests, Survey, Form B, was chosen 
mainly because the multiple choice answers 
could be reliably coded in terms of generali- 
zations and details for the purpose of this 
study. 

In addition, the items have alternatives 
which are “true” in the sense that they are 
factually presented in the stories, but incor- 
rect in answer to the question. Thus answer- 
ing a question confronts the subject not only 
with the problem of selecting an alternative 
which appears in the paragraph, but he must 
discriminate the generalization or detail rele- 
vant to the question from irrelevant generali- 
zations and details. Unfortunately there were 
no alternatives which were “true” but ir- 
relevant generalizations and also there were 
few incorrect generalizations alternatives. The 
items can be classified into two groups: (a) 
those where the subject must select the rele- 
vant generalizations from among alternatives 
including “true” but irrelevant details, incor- 
rect details, and possibly an incorrect gen- 
eralization, and (6) those where the subject 
must choose the appropriate detail from 


ition 


109 





110 E. Robert Sinnett and Ruth Roberts 


“true” but irrelevant details and incorrect 
details. Ten items involved selecting the ap- 
propriate generalization and seven involved 
selecting the appropriate detail. Three items 
were unclassifiable using the criterion of 
agreement by three of four judges. 

Since most of the item alternatives which 
were generalizations were right answers, a 
preference for generalizations in answering 
items would lead to more correct answers to 
questions directed toward a main idea. Con- 
sequently it was predicted that an emphasis 
on W (as indicated by number and percent- 
age of W) would be positively related to at- 
taining the correct generalizations on the read- 
ing test, but an emphasis on details (D+ 
Dd) would be negatively related to attaining 
correct generalizations on the criterion. 

Another index of intellectual organizing ac- 
tivity is Beck’s Z. This takes into account 
not only the subject’s attempt to organize 
the stimuli into wholes, but also includes the 
integration of details commonly perceived as 
separate. Beck interprets Z as “. . . the ca- 
pacity to grasp relations not perceived by 
others” (1, Vol. 2, p. 12). Since it has been 
found that scoring Z in unit weights bears 
such a high relationship to Beck’s weighted 
scores (10), unit scores were used in this 
study. It was predicted that Z would be posi- 
tively related to attaining correct generaliza- 
tions and correct details on the reading test. 

An overemphasis on W in the Rorschach 
protocol is interpreted as a lack of attention 
to the practical, obvious aspects of problems. 
Hence, it was predicted that W and W% 
would be inversely related to attaining cor- 
rect answers to the detail items on the read- 
ing test, but an emphasis on details (D + 
Dd) on the Rorschach would be positively 
associated with attaining correct answers to 
detail items.* 

In order to test these hypotheses independ- 
ent of the effects of reading speed, each of 
the Rorschach variables used above was re- 
lated to the percentage of correct generaliza- 


8 Originally, we had planned to use number and 
percentage of D and Dd, but the number of Dd re- 
sponses was too few to warrant a separate analysis 
(M = 1.25). This finding is comparable to that of 
Harrower-Erickson and Steiner (5) who obtained a 
mean of 1.3 Dd responses. 


tions of the generalization items attempted 
by the subject, and the percentage of cor- 
rect details of the detail items attempted.‘ 

Thus, sixteen tests were made relating the 
Rorschach variables (W%, number of W re- 
sponses, number of [D+ Dd] responses, and 
Z) and the reading comprehension variables 
(number and percentage of correct generali- 
zations, and number and percentage of cor- 
rect details). 

To test the above hypotheses independent 
of the effects of intelligence, the Vocabulary 
section included in the reading comprehension 
test was used for statistical control. Although 
ACE scores were available on 84 per cent of 
the sample and it has been more thoroughly 
studied and used as an intellectual measure, 
we felt it would have been uneconomical to 
discard 16 per cent of the sample in order to 
have complete data for each case. For a sam- 
ple of 36 cases in the first group the Vo- 
cabulary test correlated .59 (p < .01) with 
the ACE. This lends support to the choice of 
this measure as an index of intelligence. 

Story recall. One might argue that the 
above predictions would not be confirmed be- 
cause the Rorschach gives a measure of how 
the individual organizes ambiguous stimulus 
material, whereas in more structured situa- 
tions such as the reading comprehension test, 
the subject may be able to shift his set in ac- 
cord with the structure provided by the ques- 
tions. For example, if a subject with a high 
W% were asked a question involving the dis- 
crimination of relevant from irrelevant de- 
tails, he might shift in accord with the set 
provided by the question. W% itself is amen- 
able to shift as a function of the set given by 
the experimenter (3). 

For this reason another criterion task was 
selected to see if the Rorschach variables 
would predict performance in a less struc- 
tured task. For this purpose free recall of 
stories containing both generalizations and 
details was chosen. The stories used in this 
study may be found in Center and Persons 
(2, pp. 28-30). Each story contained content 
which could be coded into two generalizations 
and two details. After reading each story the 


4 These analyses were suggested by Dr. Bernard S. 
Aaronson. 
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subjects were told to write a brief summary 
of the material they had just read. 

The hypotheses relating Rorschach vari- 
ables and this criterion are analogous to those 
relating Rorschach variables and the previous 
criterion. 


Results 


Four judges were used to establish agree- 
ment in designating the 80 multiple choice 
alternatives of the 20-item reading compre- 
hension test as details or generalizations. Each 
alternative was judged to belong to one of 
two categories: generalization (G) or detail 
(D’). The means of the six interjudge agree- 
ments were as follows: G, 77 per cent, and D’, 
72 per cent. All interjudge agreements were 
significant beyond the one per cent level of 
confidence, using x* for 1 df. 

Interjudge agreement on whether or not an 
alternative was true in terms of adhering to 
the material presented in the reading para- 
graphs averaged 79 per cent for the six in- 
terjudge comparisons, and all interjudge com- 
parisons were significant beyond the .001 level 
of confidence using x’ for 1 df. 

The alternatives apparently can be reliably 
coded into correct and incorrect generaliza- 
tions and details and also can be coded as 
true or not true. 

For the story recall criterion task, the scor- 
ing agreement for two judges ranged from 80 


per cent to 95 per cent for scoring the ré€call 
of the two generalizations and two details of 
Story 1 (2, p. 28). The agreement was com- 
puted separately for each category vs. all 
other categories and all the relationships were 
significant beyond the .001 level of confidence 
For Story 2 (2, pp. 29-30) the range of in- 
terjudge agreement was from 65 per cent to 
78 per cent. One of the generalizations could 
not be coded reliably, and, for the second, 
agreement was significant at the .05 level of 
confidence. Coding of both details was sig- 
nificant beyond the .01 level of confidence. 
Clearly Story 2 was less satisfactory for scor- 
ing agreement. One of the generalizations of 
Story 2 was excluded from the analysis be- 
cause interjudge agreement was not statisti- 
cally significant. 

For testing the significance of the relation- 
ships between the Rorschach and the criteria, 
the analysis was done using the tables by 
Mainland and Murray (7) for the median 
test. These tables were used as an approxi- 
mation since one-sided tests seemed appro- 
priate. In addition, for all relationships scat- 
ter plots were made to locate optimum 
cutting points for possible cross validation. 
Where relationships approached significance, 
Pearson correlation coefficients were com- 
puted to permit statistical control of intelli- 
gence by means of partial correlation. 

Of the sixteen relationships predicted be- 


Table 1 


Selected Correlations Between Predictors and Reading Comprehension Variables 




















Rorschach 
Sample 1 Sample 2 
Reading comprehension* r pb r p 
Z vs. no. correct generalizations 46 01 14 n.s 
Z vs. % correct generalizations .28 05 20 05 
W vs. % correct generalizations .29 O5 i1 n.s 
W vs. % correct details — 30 05 12 n.s 
Vocabulary 
Vocab. vs. no. correct generalizations 62 O01 49 01 
Vocab. vs. % correct generalizations .25 05 39 01 
Vocab. vs. correct details 33 OS 29 01 
Vocab. vs. % correct details 31 05 23 05 
Vocab. vs. Z 33 05 00 n.s 





* % refers to percentage of correct responses to items of a given category attempted by the subject 
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Table 2 
Selected Partial Correlations Between Rorschach Variables and Reading Comprehension 


Variables with Vocabulary Held Constant 





























Sample 1 Sample 2 
Reading comprehension* Partial r p Partial rf >? 
Z vs. no. correct generalizations 35 01 
Z vs. % correct generalizations .22 10 21 05 
W vs. % correct generalizations 27 05 — — 
W vs. % correct details —.28 05 — = 
* %, refers to percentage of correct responses to items of a given category attempted by the subject. 


+ The missing entries were not computed because the zero-order r’s were not statistically significant (see Table 1). 


tween the Rorschach and the reading compre- 
hension measures, four of the zero-order r’s 
were significant beyond the .05 level of con- 
fidence in the expected direction (see Table 
1). After partialling out the effects of intelli- 
gence (see Table 2), three of these relation- 
ships were significant beyond the .05 level of 
confidence and one was significant beyond 
the .10 level of confidence. On the cross- 
validation sample only one of these partial 
correlations attained statistical significance. 
Beck’s Z is significantly related to the per- 
centage of correct generalizations of the gen- 
eralization items attempted when Vocabulary 
is held constant. The partial correlations were 
small (.22, first sample, and .21, second sam- 
ple) but statistically significant (p < .10 and 
p < .05, respectively). 

Vocabulary is related in the expected direc- 
tion to all four of the reading comprehension 
criteria beyond the .05 level of confidence in 
both samples. 

For testing the hypotheses about the rela- 
tionships between the Rorschach variables 
and the story recall, the story criteria were 
inspected independently of the Rorschach 
data for suitable cutting points. For both 
stories, recall of a generalization vs. no re- 
call of a generalization and recall of one or 
more details vs. no recall of details, appeared 
to be meaningful cutting points that were 
near a median split of the population. Tables 
for 2 X 2 tests (7) were used as an approxi- 
mation and where a relationship approached 
significance Fisher’s Exact Test was com- 
puted, since one-tailed tests were desired. 

In all, twenty tests were made (number of 
W, W%, number of [D + Dd], Z, and Vo- 


cabulary vs. the recall of generalizations and 
the recall of details for both stories) and none 
was statistically significant. Therefore the 
story recall task was omitted in the cross- 
validation study. 

In order to determine if subjects’ behavior 
in the criterion tasks was consistent, the inter- 
relations of the reading comprehension meas- 
ures and the story recall measures were deter- 
mined. Of ten tests of significance, three 
were significant beyond the .05 level of con- 
fidence. The number of correct generaliza- 
tions on the reading comprehension test was 
significantly positively associated with the re- 
call of generalizations on Story 1 (x? = 3.05, 
1, df, p < .05 in predicted direction) , the per- 
centage of correct generalizations in the read- 
ing comprehension test and the recall of gen- 
eralizations on Story 1 were significantly re- 
lated (p < .01), and the recall of details on 
Story 1 was significantly positively related to 
the recall of details on Story 2 (x? = 3.40, 
p < .05 in predicted direction). The absence 
of some of the expected relationships may be 
due to the rather low interjudge agreement in 
scoring the generalization on Story 2. 

There is some evidence that the criteria are 
measuring the same thing, but they are not 
highly related. This finding is congruent with 
Goldner’s (4) data which showed low but 
positive interrelations among his measures of 
approach in problem-solving tasks. 


Discussion 


In contrast with a study by Goldner (4) 
there was no evidence of a relationship be- 
tween Rorschach Ap and approach in other 
tasks. The difference in outcome may be at- 
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tributable to the kinds of criterion tasks used. 
Goldner used cognitive tasks such as the 
Kohs blocks, anagrams, etc., whereas this 
study employed meaningful verbal material. 
Perhaps Ap is related to behavior in ap- 
proaching some kinds of problems, but not to 
the comprehension and recall of meaningful 
verbal material. Only Z was found to show 
promise for predicting performance in this 
domain. 

Since Z gives more weight to the qualitative 
aspects of performance than the location cate- 
gories, perhaps future studies should focus 
more on the qualitative aspects of Rorschach 
performance. For research of this sort, it 
would seem advisable to use the individual 
Rorschach since a more thorough inquiry is 
feasible than when using a group technique. 
Another advantage of using the individually 
administered record is the greater production 
of unusual details (6) than in the group 
technique. 


Summary 


This study was undertaken in order to test 
hypotheses about the relationship between 
the organization of the Rorschach stimuli and 
the organization of cognitive material. A sam- 
ple of 43 subjects seeking help for reading 
and study skills was given the Diagnostic 
Reading Test, Survey Section, Form B, a 
group Rorschach, and a free recall task. A 
cross-validation sample of 69 subjects was 
given the same battery excluding the free re- 
call task. The findings are as follows: 

1. There is no support for the hypotheses 
relating Rorschach approach and thinking in 
terms of generalizations or details using either 
a structured or an unstructured cognitive task. 

2. There is evidence that Z, a measure of 
organizational activity on the Rorschach, is 


related to selecting more highly organized re- 
sponses in a structured cognitive task when 
intelligence and speed are controlled. 

3. When compared with the Rorschach 
variables used in this study, vocabulary, an 
intellectual measure, is a better predictor of 
choice of relevant details and generalizations 
from irrelevant generalizations and details re- 
gardless of whether speed is controlled. 

4. Neither the Rorschach variables nor vo- 
cabulary predicted the free recall of generali- 
zations or details from story material. 

5. There is some evidence of consistency in 
the part-whole approach among verbal tasks. 


Received June 23, 1955. 
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The Relation of Rorschach Indices of Extratension 
and Introversion to a Measure of Responsiveness 
to the Immediate Environment’ 


Lester Mann * 


Camden Area Mental Hygiene Clinic 


One of the basic assumptions underlying the 
Rorschach test’s indices of extratension and 
introversion is that the former are indicative 
of an orientation toward the environment and 
the latter of one away from it and toward the 
self. An accumulating literature has come to 
support this assumption (6, 8, 10, 12, 13, 14). 

The present research is an attempt to con- 
tribute further to its validation through the 
test of a logical corollary. Assuming that ex- 
tratension is characterized by responsiveness 
to the immediate environment and introver- 
sion by a lack of such responsiveness, the in- 
vestigation sought to demonstrate relation- 
ships between Rorschach E-I indices and an 
operationally defined measure of responsive- 
ness to the immediate environment. 

The following Rorschach E-I indices were 
selected for study: M, C Total, C Sum, M:C 
Total, M:C Sum, FM+m, Feo+c+C’, 
FM + m:Fc+c+C’. The first five indices 
are those originally proposed by Rorschach 
(11); the latter three were later evolved by 
Klopfer (7). 

The criterion measure of responsiveness to 
the immediate environment was operationally 


1 This article is based on a dissertation submitted 
in partial fulfillment of the requirements for the de- 
gree of Doctor of Philosophy, University of North 
Carolina, 1953. The major part of the investigation 
was carried out while the writer was a trainee in the 
VA Clinical Psychology Training Program. The guid- 
ance of Drs. Harry W. Crane, Joseph G. Dawson, 
and Dorothy Terry, members of the writer’s tnesis 
committee, and Dr. B. J. Winer, who acted as sta- 
tistical advisor, is gratefully acknowledged. 

2 Formerly at the Mental Hygiene Clinic of Raleigh 
and Wake County. 


defined in terms of words referring to the im- 
mediate environment written in the Binet as- 
sociation task. The Binet task consists of 
having a subject write a list of words under 
experimentally controlled conditions and of 
determining, through inquiry, their temporal 
referents. It is assumed that words referring 
to the immediate environment are indicative 
of responsiveness to it. It is further assumed 
that the number of such words offers a meas- 
ure of the degree of responsiveness. Binet (2), 
employing the original version of the task 
with his two daughters, found that Mar- 
guerite, clearly an extrovert, gave a large 
number of references to her immediate sur- 
roundings; conversely, Armande, an introvert, 
gave few such references. For a succinct criti- 
cal appreciation of Binet’s work, the reader is 
referred to Goodenough’s discussion (5). A 
fuller exposition of the writer’s adaptation of 
the Binet technique has been presented else- 
where, together with a rationale for the ac- 
ceptance of the criterion measure as an op- 
erationally defined measure of extroversion- 
introversion (9). 

The general hypothesis of the study was: 
Rorschach indices of extratension and intro- 
version are related to the criterion measure 
of responsiveness to the immediate environ- 
ment. 

The following specific 
tested: 

1. Rorschach indices of extratension, i.e., 
C Total, C Sum, Fc +c + C’, are positively 
related to the criterion measure. 

2. Rorschach indices of introversion, i.e., 


hypotheses were 
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M, FM +m, are negatively related to the 
criterion measure. 

3. Rorschach ratio indices of extratension- 
introversion, i.e., M:C Total, M:C Sum, FM 
+m:Fc+c+C’, are negatively related to 
the criterion measure. 


Procedure 


The experiment was conducted in a small 
room with a single window through which 
adjacent buildings and trees could be seen. 
Noises could be heard from outside the room, 
both within and without the building. The 
room was equipped with a table, three chairs, 
a bookcase holding several objects, and a 
magazine stand with a few magazines scat- 
tered upon it. These articles were its original 
equipment and, to avoid any possible impres- 
sion of contrivance, were not altered in any 
way. 

Fifty undergraduate students served as sub- 
jects in the experiment which was conducted 
individually. Two administrations of the Binet 
association task opened the experimental ses- 
sion. The Rorschach test was then given. A 
final administration of the Binet task fol- 
lowed. Standard sets of Rorschach inkblots 
were employed for the test. Sheets of paper, 
numbered vertically from one through twenty- 
five, and pen were provided for the Binet 
task. 


The administrative procedure for the Binet task 
was as follows: The subject was first instructed, 
“Write twenty-five words, any twenty-five words 
just as they come to mind. Write no sentences.” The 
task completed, the subject was questioned as to his 
choice of words. “What did you have in mind when 
you wrote that word? What made you write it?” 
If a question remained as to a word’s exact tem- 
poral referent, the examiner asked whether the sub- 
ject had “anything particular in mind” when he 
wrote it. 

The Rorschach test was administered in general 
accordance with the procedure outlined by Klopfer 
(7). A maximum of ten responses was allowed for 
each inkblot to insure completion of the test within 
a reasonable time period. 

Beck’s system (1) was used to score the deter- 
minants in the E-I indices formulated by Rorschach, 
Klopfer’s (7) to score the determinants in the in- 
dices he proposed. Certain responses are scored M 
under the former system but FM or m under the 
latter. In such cases both M and FM + m indices 
were credited. 

When a percept involved multiple determinants, 
Beck’s rule (1) was foliowed and multiple credit as- 
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signed, i.c., all determinants were equally scored. The 
one exception taken to the rule was in the circum- 
stance of determinants composing the same index, 
eg., only one score would be credited to the index 
FM +m on the basis of a percept involving both 
animal and inanimate movements. It was felt that 
multiple scoring in such cases would overcredit the 
index. 


A total of seventy-five words were obtained 
through the three administrations of the Binet 
association task. These were analyzed to de- 
termine the number that referred to the im- 
mediate environment. The latter constituted 
the criterion measure. 

The sole basis for inclusion of a word 
within the criterion measure was that of ref- 
erence to the immediate environment; to any 
aspect of it and in any manner whatsoever 
The immediate environment was defined as 
extending the entire experimental period, i.e., 
from the moment the subject entered the ex 
perimental room to that of his completing the 
assignment. Examples of references to it are 
words referring to the room’s furnishings, to 
sounds coming from without the building, to 
the subject’s physical states at the time of 
the experiment. 


Analysis of the Data 


Chi square and tetrachoric r were the sta- 
tistics employed in the investigation. These 
counting procedures have been recommended 
for use with Rorschach data since they are 
free of assumptions the latter cannot meet, 
i.e., normality of sample distributions and 
equality of test units (4). 

The ,*’s were computed with corrections 
for continuity where necessary. One-tailed 
tests of significance were employed since the 
directions of relationship were predicted in 
all instances. The .05 point was considered 
crucial for rejection of the null hypothesis 

The Thurstone computing diagrams (3) 
were employed to estimate the r;,’s. The z 
transformation was used to average the latter 
and to test the significance of differences be- 
tween them. 

The statistical treatment required the rank- 
order distribution and dichotomization of the 
experimental variables. The latter was car- 
ried out to achieve, as closely as possible, an 
equal division of the distributions into high 
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and low score groups. If several ties appea.ed 
at a distribution median, they were assigned 
to either group as best approximated equality 
of division. If but two ties appeared at this 
point they were randomly assigned, one to 
each group. 

In considering the total group of subjects 
certain of the Rorschach E-I indices were 
found significantly related to R, ie., M, C 
Total, Fm +m and Fc +c+C’ are related 
to R beyond the .05 point for the total group. 
A procedure recommended by Cronbach (4) 
was employed to control this factor. These 
indices were studied, as regards their relation- 
ship to the criterion measure, within two sub- 
groups formed, on the basis of R, from the 
total group; a Aigh R group, consisting of 
twenty-four subjects, with R ranging from 


forty-eight through one hundred, and a low 
R group, comprising twenty-six subjects, 
with R ranging from twenty-three through 
forty-six. Their relationships to R are not 
significant within these subgroups. 

In order to obtain over-all estimates of the 
relationships of M, C Total, FM + m, and 
Fc + c+ C’ to the criterion measure for the 
total group of subjects, the results obtained 
from the subgroups were combined, i.e., the 
x’’s added, with their significance being de- 
termined for two degrees of freedom, and the 
r,s averaged through the z transformation. 
Those E-I indices not significantly related to 
R in the total group, ie., C Sum, M:C Total, 
M:C Sum and FM + m:Fc+c+C’, were 
studied directly as regards their relationship 
to the criterion measure, i.e., in the total 


Table 1 


Summary of Rorschach and Criterion Measure Data 














Relation to R 








Variable Mean Range SD Divisiont x? re 

Total group 
R 54.16 23-100 26.63 
M 7.88 0-32 5.98 24-26 2.93* +.42 
C Total 7.88 1-18 4.86 24-26 4.16* +-.54 
C Sum 6.53 5-17.00 4.31 25-25 1.28 +.31 
M:C Total 1.76 0-17.00 1.34 25-25 53 +.19 
M:C Sum 2.44 0-34.00 4.94 25-25 04 +-.07 
FM+m 11.7 2-37 6.62 25-25 5.02* +.54 
Fe+c+C’ 8.04 1-39 6.18 26-24 8.03* +-.73 
FM+m:Fe+c+C’ 2.07 .25-8.00 1.59 25-25 04 +-.07 
Criterion 22.63 0-66 15.71 

High R group 
R 77.75 48-100 18.65 
M 10.42 0-32 7.38 12-12 17 +.25 
C Total 9.79 1-18 5.20 12-12 --f —.25 
FM+m 7.14 5-37 7.03 11-13 .00 + .02 
Fe+c+C’ 5.38 2-39 5.84 12-12 .00 .00 
Criterion 25.25 0-66 16.58 

Low R group 
R 32.38 23-46 7.42 
M 5.54 1-12 2.65 13-13 .00 + .13 
C Total 6.12 1-14 3.74 13-13 2.46 +.60 
FM+m 9.71 2-20 A9 14-12 08 +.14 
Fe+e+C’ 3.42 1-13 5.12 14-12 1.07 + .36 
Criterion 20.19 2-56 14.45 
+ Division of Rorschach E-I variable distributions to determine relationship to R and criterion measure; numbers listed are 


subject totals in upper and lower halves, respectively. 
; 4 of .17 was in direction opposite to that predicted. 
Significant at or below the .05 point. 
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Table 2 


The Relation of Rorschach E-I Indices to the Criterion Measure 











Total group 





High R group 


Combined results 


Low R group of subgroups 


Rorschach E-I %t %t + At 
Index x? re agt. x? r, agt. x? rr agt x? re agt 

M 417* —.71 75 15 12 54 6.15* 45 64 
C Total 150 +48 67 5.54°* +.77 77 1061%* +.65 72 
C Sum 5.12* +.54 68 
M:C Total 5.12* —.54 68 
M:C Sum 2.88* —43 64 
FM+m 31 —.24 58 t 13 46 45 LOS 52 
Fe+c+C’ 4.17* +.71 75 t 13 46 5.67* 44 OO 
FM+m:Fe+c+C 1.28 —.31 6& 

Note.—~*’s for total, high R, and low R groups are based on one degree of freedom x*’s for the combined results of the sul 
groups are based on two degrees of freedom. 

* Significant at the .05 point 

** Significant at the .01 point. 

t % of subjects in group whose results are in agreement with hypothesized relationship of Rorschach E-I indices to the criterion 
measure. f 

t The x?’s computed for the relationships of FM +m and Fc+c+C’ to the criterion measure were in a direction opposite to 


that predicted. For both indices x? was .08. 


group per se, without controls for R being 
instigated. 

Table 1 presents a summarization of the 
data obtained in the total, low R, and high R 
groups. The means, ranges, and SD’s of the 
various variable distributions are presented 
therein, the manner of their division for tests 
of significance and estimation of correlation, 
the relationship of Rorschach E-I variables 
to R. 


Results and Discussion 


The results of the investigation proper are 
presented in Table 2. 

Of the E-I indices studied directly in the 
total group of subjects, C Sum, M:C Total, 
and M:C Sum are related to the criterion 
measure in statistically significant fashion. 
The null hypothesis is sustained in the case 
of FM + m:Feo+c4+C. 

None of the E-I indices studied in the sub- 
groups are significantly related to the cri- 
terion measure in both high R and low R 
groups. M and Fc + c + C’ demonstrate sig- 
nificant relationships in the former but not in 
the latter, C Total in the latter but not in 
the former, FM + m in neither. However the 
treatment of these indices in the subgroups 
was specifically to control for R. The com- 
bined results, from the subgroups, offering 


over-all estimates of the relationships to the 
criterion measure for the total group of sub- 
jects, are regarded as the crucial ones for the 
experiment. They permit rejection of the null 
hypothesis for M, C Total, and Fc +c + C’ 
while sustaining it in the case of FM + m. 
None of the statistically significant results 
was significantly different from the others. 


Summary 


A study was carried out with fifty subjects 
to test the assumption that Rorschach indices 
of extratension and introversion are indicative 
of opposed orientations towards the external 
environment. Assuming that extratension is 
characterized by responsiveness to the im- 
mediate environment and introversion by a 
lack of such responsiveness, an attempt was 
made to demonstrate relationships between 
Rorschach E-I indices and an operationally 
defined measure of responsiveness to the im- 
mediate environment. 

The Binet association task furnished the 
latter measure. The subjects were required to 
write, on three different occasions, the first 
twenty-five words that came to mind. The 
number of these words that referred to the 
immediate environment formed the criterion 
measure. 

The following Rorschach E-I indices were 
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found significantly related to the criterion 
measure at and below the .05 point of sig- 
nificance: M, C Total, C Sum, M:C Total, 
M:C Sum, Fc + c + C’. The null hypothesis 
was sustained for the relationships of FM + 
m and FM + m:Fc+c+C’. None of the 
statistically significant relationships was sig- 
nificantly different from the others. 


Received June 30, 1955. 
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Rorschach (5) implied a triadic relation- 
ship among motor behavior, motion percep- 
tion, and cognitive processes. Although he 
did not offer a theoretical rationale for his 
empirical observations, theorists of rather dif- 
ferent orientation have postulated similar rela- 
tionships. Werner and Wapner’s (11) sensory- 
tonic theory of perception has provided a 
theoretical basis for the vicarious relationship 
between two members of this triad—motor 
activity and motion perception. Studying sen- 
sory-tonic theory within the framework of the 
Rorschach situation, previous experiments (2, 
7) have revealed the inhibition of a motor act 
to be followed by an increased production of 
human movement (4/7) responses. Further, the 
hypothesis that individuals who are motor- 
ically hyperactive tend to produce fewer 
movement responses than those who are less 
active has been supported experimentally (6, 
8, 10). 

A theoretical link also exists between motor 
activity and cognition. In psychoanalytic the- 
ory, the gradual shift from the primary to the 
secondary process involves the delay and con- 
trol of impulse expression, and the substitu- 
tion of swmbolization, fantasy, planning, and 
thinking as “experimental actions” in place 
of immediate response on a behavioral level. 
Rapaport (4) has pointed to the inhibition of 
activity designed to produce gratification as 


1From the Veterans Administration Regional Of- 
fice, Philadelphia, Pa., and Brooklyn, N. Y. 

2The assistance of Drs. William J. Cohen and 
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the primary condition underlying the devel- 
opment of early fantasy and thus thinking 

In a previous paper (1) dealing with cog- 
nitive behavior at the level of the formation 
and inhibition of associations, we have shown 
that the ability to inhibit an habituated ac- 
tivity at the motor level (writing a phrase as 
slowly as possible) is directly related to the 
ability to inhibit learned associations. Those 
Ss who were able to slow their writing down 
for the longer period of time were able to 
inhibit learned associations and substitute 
new ones for them more quickly than were 
poor motor inhibitors. Other studies (6, 9) 
have shown relationships between the percep- 
tion of M on the Rorschach, motor inhibition, 
measures of fantasy on TAT stories, and 
scores on tasks designed to measure general 
planfulness. Since motor and cognitive inhi- 
bition are interrelated (1), further support 
of the triadic relationship among motion per- 
ception, motor inhibition, and cognitive proc- 
esses would be obtained were it shown that 
M is also a measure of the ability to inhibit 
associations. 


The Experiment 


Hypothesis. A direct relationship between 
the production of Rorschach human move- 
ment responses and the ability to inhibit as- 
sociations is hypothesized. It is proposed that 
Ss who are sensitive to kinesthetic factors on 
the Rorschach will be better able to inhibit 
associations than those who are less percep- 
tive of human movement in the inkblots. 
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Procedures. A sample of 93 neuropsychi- 
atric patients, unselected as to diagnosis, was 
drawn from a mixed population of outpatient 
and hospitalized veterans at two VA facilities. 
The experimental procedures were integrated 
into the diagnostic battery given to these rou- 
tinely referred patients. Following the Ror- 
schach administration, two word association 
tasks were presented in order to get measures 
of cognitive inhibition time (CIT) and ordi- 
nary word association time (WAT). The pro- 
cedure previously described by the authors 
(1) is briefly as follows: 


A list of ten easy paired associates was read to the 
subject. After the associates were learned to a cri- 
terion of one perfect recitation, S was asked to re- 
spond upon presentation of the stimulus word with 
any word other than the learned associate. CIT was 
taken at the average time interval between the pres- 
entation of the stimulus and the response for the ten 
pairs. Since this time is presumably taken up in part 
by the process of finding a new associate, a measure 
of word association time was obtained, usually fol- 
lowing an interval filled with other tasks. WAT was 
thus computed from the mean response time to a 
list of ten other words drawn from the same source 
as the original list (3). 


Results 


The sample was divided at the median into 
High M (2 or more M) and Low M (less 
than 2 M) groups. As can be seen in Table 1, 
there is little difference between these two 
groups in WAT, but the High M group is dis- 
tinctly superior in CIT to the Low M group. 
It is to be noted in the table that the /ower 
the CIT the more effective the inhibitory 
process is assumed to be. 

Analysis of covariance takes into account 
the small difference in WAT between the two 


Table 1 


Mean Cognitive Inhibition and Word Association 
Times, in Seconds, of High and Low 
M and M% Groups 








Word Cognitive 





association inhibition 
Group N time time 
High M (>2) 49 2.75 4.15 
Low M (<2) 44 3.10 5.99 
High M% (> 10%) 46 2.51 4.09 
Low M% (<10%) 47 3.32 5.91 


Table 2 


Analysis of Covariance of Word Association and 
Cognitive Inhibition Times of High and 
Low M and M% Groups 














Adjusted mean 
square F 


Source df M M% 





M M% 
Between groups 1 57.65 35.90 17.21* 10.11* 
Within groups 9O 3.35 3.55 








*p < Ol. 
groups and shows the variation in CIT to be 
statistically significant with an F of 17.21, 
p< Ol. 

Since the number of M responses may be 
a function of the total number of responses 
(R), the sample was split at the median into 
High M% (10% or more) and Low M% 
(less than 10%) groups and treated statisti- 
cally in the same manner as for M. Table 1 
shows the resultant mean CIT and WAT to 
be identical in direction to those for M, and 
very similar in magnitude. Analysis of co- 
variance yielded an F value of 10.11, which 
is significant at p < .01 (Table 2). 

It is of interest to consider the findings in 
terms of the Erlebnistype ratio M:Sum C. 
This would enable us to assess whether CIT 
varies systematically with the weighted sum 
of color responses taken in conjunction with 
movement responses, and would give an in- 
dication of relative importance of the two 
factors. The median Sum C was obtained 
from the total sample. Those Ss falling be- 
low the median were designated low Sum C, 
while those falling above the median were 
designated high Sum C. Thus four groups 
were formed: Ss with 2 or more M and be- 


Table 3 


Mean Cognitive Inhibition Times of High and Low 
M Groups in Conjunction with Sum C 














CIT 

Groups (in seconds) 
High M—Low Sum C (N=23) 4.11 
High M—High Sum C (N=26) 4.26 
Low M—Low Sum C (N=29) 5.90 
Low M—High Sum C (N=15) 6.13 
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Analysis of Variance of Cognitive Inhibition Ti 





Cognitive Inhibition and Rorschach Movement Responses 
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mes of Different Movement and Color Types 








Source df SS V F p 
High M—Low M 1 77.898 77.898 15.76 01 
High Sum C—Low Sum C 1 117 A117 
MXSum C 1 1.192 1.192 
Within Groups 89 439.972 4.944 
Totals 92 519.179 


low median Sum C, 2 or more M and above 
median Sum C, less than 2 M and below 
median Sum C, and less than 2 M and above 
median Sum C. The results shown in Table 3 
reveal Sum C to be obviously of no signifi- 
cance in so far as CIT is concerned. Analysis 
of variance (Table 4) indicates both the vari- 
ance due to color and to the interaction of 
color and movement (Erlebnistype ratio) to 
be no greater than would be expected by 
chance. 


Discussion 


These results support the hypothesis that 
M, or empathic motion perception, is related 
to the ability to inhibit an activity at the cog- 
nitive level, just as it previously has been 
shown to be related to the ability to inhibit 
an activity at the motor level. In our earlier 
study, relating cognitive to motor inhibition 
(1), there was some question as to the degree 
to which the results were dependent upon 
associative facility. It was assumed in that 
experiment, as in this one, that associative 
facility, as measured by simple word associa- 
tion time, would vary randomly among the 
experimental population, and that obtained 
differences in CIT would be due to systematic 
variation in the ability to inhibit associations. 
A partial test of this assumption in the previ- 
ous study suggested associative facility to be 
at best a secondary factor. In this study, the 
finding that simple word association time is 
not significantly related to the M response, 
bears out the previous conclusion and further 
reinforces the notion that M is related to the 
inhibition process itself rather than to mere 
verbal facility. Near identical findings with M 
and M% tends to eliminate the ubiquitous R 


(response total) variable from consideration 
here. 

The appraisal of the role of color response 
suggests that Ss who are productive of M re- 
sponses are good cognitive inhibitors regard- 
less of whether they give few or many color 
responses. Conversely, Ss who give few M 
responses are poor inhibitors of associations 
notwithstanding their sensitivity to color 
stimulation on the Rorschach. In view of the 
controversial status of the meaning of color 
responses, these data contribute to our un- 
derstanding in a negative way. While this ex- 
periment was not designed to investigate be- 
havioral correlates of color responses, and 
none were hypothesized, the results point to 
the movement response as being a more sen- 
sitive measure of control, at least at the cog- 
nitive level, than is the color response. This 
finding has previously been shown to hold for 
control at the motor level as well (6, 8), but 
the results with cognitive inhibition appear to 
be more definitive. 

These data then provide additional evi- 
dence in support of the theoretical struc- 
ture which considers impulse delay, empathic 
motion perception, fantasy, and thinking to 
be related and in some ways interdependent 
processes. The experimental evidence adduced 
in support of the theoretical formulation de- 
rives from the study of learned motor habits, 
learned associations, and situations generally 
removed from those in which primary im- 
pulses or strong affects are expressed. It re- 
mains to be investigated if the same relation- 
ships would obtain were the experiment to 
deal more directly with primary impulse or 
affect arousing situations. The data reported 
here were collected in the course of routine 
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clinical testing. From this vantage point, with 
access to case history material, our observa- 
tions suggest that the techniques used in this 
and the previous study do actually reflect the 
degree to which strong impulses or affects are 
characteristically controlled in the process of 
avoiding conflict with society. These impres- 
sions, of course, require experimental verifi- 
cation. 


Summary and Conclusions 


This experiment was designed to test the 
hypothesis of a direct relationship between 
the production of human movement (M) re- 
sponses on the Rorschach test and the ability 
to inhibit associations. Ninety-three Ss were 
administered the Rorschach and a task de- 
signed to measure the efficiency of cognitive 
inhibition. The Ss were divided at the median 
into High and Low M and M% groups and 
these groups were then compared for cog- 
nitive inhibition time (CIT). Analysis of the 
results warrants the conclusion that indi- 
viduals who are more responsive to kinesthetic 
stimuli in the Rorschach are better able to 
inhibit associations than are those who are 
not productive of M responses. 

When these findings are considered in con- 
junction with those of previous related stud- 
ies, they lend further support to the triadic 
hypothesis interrelating motor behavior, mo- 
tion perception, and cognitive processes. 


Received August 12, 1955. 
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Actuarial Prediction 


Robert 


D. Wirt 


University of Minnesota 


Meehl (2) has pointed out the generally 
favorable position which statistical prediction 
appears to have relative to the clinical pre- 
diction of behavior. In his search of the litera- 
ture for pertinent data which could be used 
to compare the relative power of clinical vs. 
statistical prediction, he was able to find only 
a few appropriate studies. The present in- 
vestigation represents such a comparison. 

Data from an earlier study by the author 
were used (3). In that study it was found 
that Barron’s Ego-Strength scale (1) success- 
fully separated hospitalized psychiatric pa- 
tients who had shown great improvement in 
psychotherapy from those who did not im- 
prove, as rated by the patients’ psychiatrists. 

One test of clinical vs. statistical prediction 
amounts to the difference between running 
psychometric data through a machine and ac- 
cepting an actuarial cutting line from which 
selection for treatment will be made as against 
running the same test results through the 
intuitive psyches of one or more clinicians 
who then make similar selections. In the pres- 
ent study the 19 patients who were rated as 
the most greatly improved as a result of a 
short period of psychotherapy from a total 
N of 535 consecutive admissions to a neuro- 
psychiatric hospital were matched for age, 


Table 1 
Composition of the Sample 


Greatly Unim- 

Statistic improved proved 

Mean age in years 28.5 32.5 
Mean days of hospitalization 38.5 39 
Mean IQ 124 105 

Median Es score 50.5 37.5 
N 19 19 
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Table 2 
Clinical Sorts of the MMPI Profiles 
Number of 


correct 


judgments 


Judge (N =38 x? 
\ 16 95 
B 22 95 
Cc 20 10 
D 22 95 
E 26 4.90" 
F 20 10 
G 20 10 
H 18 10 
*p = O04 


days of hospitalization, and diagnosis with 19 
patients who were rated as unimproved by 
their doctors. The composition of the sample 
is shown in Table 1. Most of the traditional 
diagnostic categories were represented, with 
most cases being classified among the neu- 
roses. As the data in Table 1 suggest, there is 
a positive relationship between IQ and Es 
score. This is consistent with Barron’s find- 
ing. The MMPI profiles (scored for the usual 
four validity and ten clinical scales) for these 
38 patients were given to eight expert judges 

all experienced in Multiphasic interpretation 
They were asked to sort the profiles into two 
equal groups, one category to include those 
patients for whom the judge would predict 
improvement with psychotherapy and the 
other category for those the judge predicted 
would not improve in psychotherapy. The re- 
sults of the judgments are shown in Table 2, 


1 The author wishes to express his appreciation to 
Drs. Harold Gilberstadt, Starke R. Hathaway, Rich 
ard C. Kogl, Paul E. Meehl, John J. Regan, Albert 
Rosen, William Schofield, and Mr. Arthur J. Gallese 
who were the judges in this investigation 
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where it will be seen that only one judge ex- 
ceeds chance at an acceptable level of sta- 
tistical confidence. 

The Es scale values for these groups mis- 
classify four individuals in each direction, if 
the median of the total sample is used as the 
cutting line. Thus the number of correct judg- 
ments for the mechanical sorting would be 30, 
which gives a chi-square value of 12.73, the 
probability of which is less than .001. In this 
situation the use of mechanical sorting clearly 
beats clinical prediction. 


Discussion 


Service psychologists frequently are asked 
to make recommendations for treatment. It is 
a matter of very great personal and economic 
importance to them and their patients that 
their predictions be accurate. If results such 
as those described here are indicative, it might 
be by far in the best interest of patients to 
devote our time and skill to the develop- 
ment of refined actuarial and mechanical pro- 
cedures. However, it should be remembered 
that what is here called “clinical” is not by 
any means what most psychologists would 
mean by that word. Meehl implies that such 
examples as the one given here can be used 
to compare the clinical with the statistical 
approach. As a matter of fact, in the prac- 
tical situation most psychologists make their 
recommendations for treatment upon psycho- 
metric data, as used in this investigation, to- 


gether with social history, medical data, inter- 
view material, and observation of the patient. 
An appropriate test of the difference in power 
between the two approaches would be to have 
clinical predictions made from total psycho- 
logical examinations; these to be compared 
with actuarially derived measures such as the 
Es scale. 

A further word should be said about the 
base-rate problem in prediction. In the pres- 
ent investigation the situation presented to 
the judges was a gross distortion of the base 
rates of recovery following psychotherapy in 
the hospital population used. Of 225 patients 
treated by psychotherapy alone, only 19 cases 
were rated greatly improved; but the judges 
were given the problem of selecting these 19 
from an N of only 38 cases. The real task of 
the clinician, or of the machine, in a practical 
situation would be to identify the good bets 
for psychotherapy out of the total population. 
The Es scale in the study cited could be stiff 
competition for a clinical approach. 


Received July 26, 1955. 
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Some Determinants of Successful and Unsuccessful 
Adaptation to Hospital Treatment of Tuberculosis’ 


Louis J. Moran, George W. Fairweather, and Robert B. Morton 


Veterans Administration Hospital, Houston, Texas 


One large hospital treated and discharged 
over one thousand veteran and military tu- 
berculous patients in 1952. Both veteran and 
military patients were treated under the same 
hospital conditions and by the same medical 
staff. Of the military patients, 98 per cent 
were treated successfully (disease rendered in- 
active). Of the veteran patients, only 33 per 
cent were treated successfully. This marked 
difference between the two groups in success 
of treatment for tuberculosis may be ac- 
counted for almost entirely by the voluntary 
discharge of 47.8 per cent of the veteran 
group, against the advice of their physicians 
(6). 

This datum typifies what has become the 
major problem in the successful treatment 
and control of tuberculosis, that of keeping 
the patient in the hospital until his treatment 
is completed. Every year, from one-third to 
one-half of the patients in tuberculosis hos- 
pitals leave the hospital against medical ad- 
vice (6, 8). 

In recent years, a growing number of ex- 
perimental studies of the problem of “irregu- 
lar discharge” (leaving the hospital against 
medical advice) have appeared. Tollen (8) 
has provided an excellent review of the studies 
reported prior to 1948. Representative studies 
since this time have used various measures 
such as the MMPI (1, 4, 5), tests of intelli- 
gence (7), and demographic information (2, 
5) in an effort to differentiate the potential 
“irregular discharge” patient from other hos- 
pitalized tuberculous patients. 


1 The authors should like to acknowledge the gen- 
erous support and guidance of Drs. Daniel E. Jen- 
kins, Hollis G. Boren, and Irving Chofnas, tubercu- 
losis specialists, in the conduct of this study. 
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The present investigation proposes to ex- 
amine the relationship of the patients’ (c) 
family of orientation, (5) prehospital life ad- 
justment, (c) attitudes towards situations in 
the hospital, (d) behaviors in the hospital, 
and (e) current outside-the-hospital situa- 
tions, to his success or lack of success in re- 
maining in the hospital until his treatment is 
completed. For purposes of analysis, it is hy- 
pothesized that a high level of adjustment in 
each of these areas is associated with success- 
ful completion of hospital treatment. A for- 
mal statement of the hypotheses to be tested 
is provided in the section on results. 


Method 


The initial data were obtained from 140 
male veterans who were currently receiving 
treatment for tuberculosis in a large Veterans 
Administration GM&S hospital. The sample 
comprised twenty patients from each of seven 
wards which represented different stages of 
treatment. Thus, the sample included all in- 
dividuals receiving treatment for tuberculosis 
at the installation, with the exception of four 
who would not cooperate and 40 who were too 
ill to be seen. Nine months after the initial 
interview the hospital records were examined 
to ascertain which patients had left and the 
nature of their discharge. 

The criteria for irregular discharge were: 
(a) the patient must have left voluntarily (5) 
he must have left against the advice of his 
physician. The criterion for regular discharge 
was simply that the patient had successfully 
completed treatment. In addition to these cri- 
teria unique to each group, individuals with a 
history of mixed discharges were eliminated 
from the sample. Thus, six patients were ex- 
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cluded who had received a discharge in the 
past that was the opposite of their current 
discharge. Of the 140 subjects interviewed, 
39 met the criteria for regular discharge and 
19 met the criteria for irregular discharge 
within the nine-month period.* 

Information concerning the patient’s past 
life adjustment and current outside-the-hos- 
pital situations was obtained through struc- 
tured interviews with the patient. The data 
concerning the patient’s adjustment to the 
hospital encompassed his attitudes toward the 
institution and his ward behavior. The former 
was obtained through interview, the latter 
from behavioral ratings by nurses and aides. 
Owing to the nature of their illness, each pa- 
tient was seen for not more than thirty min- 
utes at any one time. The number of such 
interviews varied from four to seven, depend- 
ing upon the patient. All responses were re- 
corded on data sheets while the interview was 
in progress. In the account to follow, a brief 
description of the content and the method of 
scoring each measure is provided.* 


Measurement of Prehospital Life Adjustment 


A life adjustment scale was devised, com- 
prising 132 items of information which cov- 
ered nine areas of living.* Here the cardinal 


2A review of the records of this hospital from 
1949 to the present indicated that over 75 per cent 
of all irregular discharges have taken place within 
nine months after admission. Since the yearly irregu- 
lar discharge rate at this hospital had been about 50 
per cent we expected at the nine month period to 
have in the irregular discharge sample over 50 of 
the 140 subjects examined. The occurrence of only 
19 irregular discharges in this sample is accounted 
for in part by a 15 per cent reduction in irregular 
discharge rate during this period. However, the fur- 
ther reduction of expected cases, not accounted for 
by the lowered irregular discharge rate, throws light 
on a hitherto little appreciated aspect of the irregu- 
lar discharge problem, the implications of which are 
developed in the Discussion section. 

8 The complete interview schedules and examina- 
tion forms, together with scoring manuals, have been 
deposited with the American Documentation Insti- 
tute. Order Document No. 4741 from the ADI Aux- 
iliary Publication Project, Photoduplication Service, 
Library of Congress, Washington 25, D. C., remit- 
ting in advance $3.75 for photoprints or $2.00 for 35 
mm. microfilm. Make checks payable to Chief, Photo- 
duplication Service, Library of Congress. 

*In the construction of this scale we borrowed 
ideas freely from many published scales. Our great- 


principle was to cover comprehensively areas 
requiring adaptation in the patient’s life. The 
items dealt primarily with factual information 
rather than with attitudes or judgments. Thus, 
a patient was not asked about his attitudes 
toward job, peers, parents, etc., but rather 
such questions as the number of jobs held, 
number of close friends, amount of time each 
parent was home, etc. 


Briefly, the areas covered were as follows. A. The 
family background scale, composed of the following 
five subscales: (a) parents’ economic status, (b) 
parents’ education, (c) parents’ occupation, (d) 
completeness of family life, ie. items concerning 
siblings, parents’ marital status, etc., (e) security as- 
pects of early life, e.g., who raised the patient, par- 
ents’ time at home, etc. B. The educational adjust- 
ment scale consisted of items encompassing grade 
school and high school truancy, adjustment to peers 
and teachers, etc. C. The social relationship scale 
covered childhood, adolescent, and adult antisocial 
behavior, relationships with peers, organizational 
membership, etc. D. The health scale was concerned 
with childhood, adolescent, and adult illness, amount 
of medical attention, etc. E. The drinking scale was 
concerned with the amount and type of alcoholic 
indulgence in adolescence and adulthood. F. The re- 
ligion scale covered childhood, adolescent, and adult 
church attendance. G. The army adjustment scale 
dealt with relationships with peers and officers, gain 
in rank, etc. H. The occupation adjustment scale was 
concerned with job level, number of jobs held in past 
5 years, reasons for leaving a job, etc. 7. The marital 
adjustment scale included marital status, number of 
marriages, number of children, disciplining of chil- 
dren, etc. 


Scoring. After all items of the life adjust- 
ment scale had been selected, each possible re- 
sponse to an item was ranked on an a priori 
basis from least to most adaptive. The score 
for each possible response to an item was its 
rank position, with the least adaptive response 
receiving a score of zero. Thus, the response 
to an item with two response possibilities re- 
ceived scores of zero and one. An item with 
three response possibilities received scores of 
zero, one, two, and so on through the items. 

To prevent items with the greatest number 
of response possibilities from unduly influenc- 
ing the total scale score, responses to each 
item were then rescored on a scale from zero 
to nine. Thus, an item with two responses 
divided the zero to nine scale into three equal 


est single debt is to Leslie Phillip’s Social Attain- 
ment Scale (3). 
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units and the two alternatives received scores 
of 3 and 6. A three-response item divided the 
zero to nine scale into four equal units and 
the three alternatives received scores of 2.3, 
4.5, and 6.7, respectively. All items on the 
scales were so rescored. Any scale score was 
the sum of the items comprising that scale. 


Measurement of Current Hospital Adjustment 


Attitude scale. An attitude scale was de- 
veloped in an attempt to provide a quantita- 
tive measure of each patient’s attitudinal ad- 
justment to the hospital. The attitude scale 
comprised 45 items that were selected as con- 
stituting important adjustmental problems for 
tuberculous patients. The problem situations 
were established by means of extensive un- 
structured recorded interviews with ten ran- 
domly selected tuberculous patients. A com- 
pilation of these patient statements consti- 
tuted the initial set of items. These items 
were organized, on an a priori basis, into the 
following areas: (a) attitudes about ward 
regulations, (5) attitudes about personnel, 
(c) attitudes about other patients, (d) atti- 
tudes about hospital facilities. 

The 45 items couched in terms of 
statements, e.g., “Bed rest is the most im- 
portant part of treatment.” During the inter- 
view the statements were read to the patient. 
The patient was instructed to respond in one 
of the following four ways: strongly agree, 
moderately agree, strongly disagree, moder- 
ately disagree. The patient’s responses were 
recorded on a data sheet during the interview. 

Attitude scale scoring. First, the experi- 
menters made a judgment as to whether agree- 
ment or disagreement with a particular item 
could be considered least or most adaptive to 
prolonged hospitalization. The response of 
the patient was then scored as 0, 1, 2, or 3, 
depending on the intensity of the attitude and 
its adaptive directionality. Thus, an item con- 
sidered adaptive in the direction of agreement 
received the following scores: strongly dis- 
agree = 0, moderately disagree = 1, moder- 
ately agree = 2, strongly agree = 3. Each item 
received scores of 0, 1, 2, 3, which were then 
transformed into the nine-point scale values 
of 1.8, 3.6, 5.4, and 7.2, respectively. The 


were 
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score for any area was the sum of the items 
comprising that area. 

Behavior rating scale. The guiding motive 
in the construction of the behavior rating 
scale was to obtain a comprehensive descrip- 
tion of ward behavior. The primary criterion 
for selection of a specific item was that it de- 
scribed an activity, a plainly observable ac- 
tion, on the part of the patient. In the col- 
lection of such items an explicit effort was 
made to suspend temporarily judgments of 
the relevancy of the item to prediction of ir- 
regular discharge. 

Initially, about 80 such items were derived 
from the knowledge gained through informal 
observation over a year’s time of the day to 
day living of patients on seven tuberculosis 
wards. After a series of preliminary tests for 
clarity and “ratability” of items, the behavior 
scale was reduced to the present 64 items 
This array of items was then re-examined 
and grouped on a rational basis into behav 
iors concerning six areas of hospital adapta- 
tion. These six areas were: (a) regulations to 
protect patient’s health, (4) regulations to 
protect the health of others, (c) personnel, 
(d) other patients, (¢ 
interaction with people outside hospital. 

Behavior rating scoring. Here again, the ex- 


) hospital facilities 


perimenters made an a priori judgment as to 
whether a particular observable behavior was 
least or most adaptive to prolonged hospitali- 
zation. Four raters, two nurses and two aides 
who had usually attended the patient for sev- 
eral months, rated him on each item. Each 
item comprised only two alternatives which 
took the generic form of behaving adaptively 
or not behaving adaptively. An 
scored in the following manner. If the four 
raters checked the adaptive alternative the 
item was scored four, if three of the four 
raters checked the adaptive alternative the 
item was scored three, if two raters checked 
the adaptive alternative the score was two, if 
one rater checked the adaptive alternative the 
score was one, if no rater checked the adap- 
tive alternative the score was zero. The zero 
through five scores were then placed on a 
nine-point scale and assigned scores of 1.5, 
3.0, 4.5, 6.0, and 7.5, respectively. The pa- 
tient’s score for any area was the sum of the 
items comprising that area. 


item was 
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Table 1 
Comparison of Regular and Irregular Discharge Groups on Prehospital Life Adjustment Scales 
(N = 58)* 
Regular Irregular 
Scale Mean SD Mean SD t pT 

Area of family background 126.53 13.81 127.31 15.81 — _ 

Parents’ economic status 24.21 3.72 24.37 3.57 — - 

Parents’ education status 6.62 2.62 7.21 2.55 — - 

Parents’ occupation status 15.85 3.61 16.42 2.61 — — 

Completeness of family life 29.36 6.42 29.89 7.25 - — 

Security of childhood 50.49 5.63 49.42 7.85 53 <.50 
Area of education 

Attainment 3.72 1.47 3.26 1.05 1.06 <.20 

Grade school* 37.50 5.62 33.47 6.88 2.20 <.01 

High school* 28.88 5.10 24.67 5.95 1.55 <.10 
Area of social relations 107.24 11.36 101.18 12.11 1.83 <.05 

Childhood 38.49 3.61 37.47 3.78 .98 <.30 

Adolescence 31.60 5.14 29.20 4.82 1.75 <.05 

Adulthood 37.15 5.14 34.51 7.26 1.43 <.10 
Area of health 50.65 5.76 50.68 6.64 - 

Childhood 21.72 2.88 22.58 2.61 - 

Adolescence 13.90 4.21 14.20 6.17 - - 

Adulthood 15.03 2.88 13.90 3.38 1.22 <.20 
Area of religion 16.62 2.85 15.00 3.30 1.84 <.05 

Childhood 6.21 84 6.00 1.08 75 <.40 

Adolescence 5.56 1.45 4.84 1.70 1.81 <.05 

Adulthood 4.85 1.35 4.16 1.30 1.97 <.05 
Area of drinking 25.36 3.13 23.89 4.54 1.28 <.20 

Adolescence 6.92 82 6.47 1.43 1.25 <.20 

Army 6.08 1.14 5.68 1.33 1.08 <.20 

Adulthood 5.92 1.68 5.42 2.05 93 <.30 

Last six months 6.44 1.46 6.32 1.98 me <.80 
Area of occupation 52.44 9.69 46.06 9.90 2.32 <.01 

Childhood 441 1.40 4.37 94 13 <.80 

Adolescence 2.90 1.37 2.58 1.27 .89 <.30 

Adulthood 45.13 8.29 39.11 9.96 2.28 <.01 
Area of the military 52.08 7.34 48.47 6.11 2.00 <.01 
Area of marriage 

Marital status 5.77 2.46 4.89 2.92 1.13 <.20 

Family life* 41.53 6.20 41.86 4.23 — 





*Grade school, high school, and: family life subscales were computed with N’s of 55, 22, and 48, respectively. 
+ » values are for one-tailed tests. No » value was ascertained for means representing reverse directionality. 


Measurement of Situations External to the 
Hospital 


From the unstructured interviews with ten 
patients and from discussions with staff mem- 
bers, 20 specific problem situations were iso- 
lated. These were then formulated into 20 
items comprising four areas. These areas were: 
(a) patient’s future employability, (5) pres- 
sure from others to continue treatment, (c) 
current economic status, (d) other family re- 
sponsibilities, i.e., children, family illness, etc. 

Scoring. The 20 items were scored in the 


same manner as the life adaptation items. All 
possible responses to each item were ranked 
on an a priori basis from the least to most 
adaptive. The score for each possible re- 
sponse to an item was its rank position, with 
the least adaptive response receiving a score 
of zero. The responses to each item were then 
rescored on a scale from zero to nine. 

All scales used in the present study placed 
each response to every item on an a priori 
nine-point scale. Such a scoring system served 
several purposes. First, it preserved the least 
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to most adaptive directionality. Secondly, 
each item contributed equally to the total 
score. Finally, all scale scores were independ- 
ent of the empirical distribution, thereby 
simplifying replication of the study. 


Results 


The first hypothesis may be stated as fol- 
lows: Irregular discharge patients when com- 
pared with regular discharge patients will 
have demonstrated poorer adjustment in areas 
of living prior to hospitalization for tubercu- 
losis. Table 1 indicates that this hypothesis 
was supported at the 5 per cent level of 
confidence in the areas of: (a) social rela- 
tions, (0) religion, (c) occupation, and (d) 
the military. The hypothesis was not sup- 
ported in the areas of (a) family background, 
(6) health, (c). drinking, and (d) marriage. 
Viewed developmentally, Table 1 shows that 
the irregular group demonstrated significantly 
poorer adjustment than the regular group in 
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grade school, in social relations and religion 
during adolescence, and in religion, occupa- 
tion, and the military in adulthood. 

The second hypothesis stated: Irregular dis- 
charge patients when compared with regular 
discharge patients will demonstrate poorer ad- 
justment to situations within the hospital. 
Table 2 indicates that this hypothesis is 
not supported in terms of the expression of 
attitudes toward ward regulations, hospital 
personnel, other patients, and the facilities 
offered by the hospital. However, Table 3 
indicates that the second hypothesis is sup- 
ported when the ward behavior of the two 
groups is compared. In all six areas measured 
the behaviors are in the predicted direction 
A further examination of Table 3 indicates 
that specifically the irregular discharge group 
demonstrated significantly poorer behavioral 
adaptation to restrictions of behaviors consid- 
ered to be detrimental to their own health and 
to the health of others. Their relationships 





Table 2 
Comparison of Irregular and Regular Discharge Groups on Attitudinal Adaptation to Hospitalizatior 
(N = 58) 
Regular Irregular 
Scale Mean SD Mean SD 
Attitude toward: 
ward regulations 60.51 9.79 58.16 8.32 96 <.30 
personnel 86.19 15.68 86.88 14.41 
other patients 21.87 2.93 20.53 4.00 1.34 <.10 
hospital facilities 69.00 9.36 66.42 10.13 93 < .30 





* » values are for one-tailed tests. 


No f value was ascertained for means representing reverse directionality. 


Table 3 


Comparison of Irregular and Regular Discharge ( 


yroups on Behavioral Adaptation to Hospitalizatior 











Regular Irregular 
Above Below Above Below 
Scale Median Median Median Median x2* pt 
Behavior concerning : 
regulations to protect patients’ health 23 16 6 13 3.84 <.01 
regulations to protect others’ health 21 18 2 17 9.90 <.001 
personnel 22 17 5 14 4.54 <.01 
other patients 20 19 6 13 1.99 <.10 
hospital facilities 19 20 5 14 2.57 <.10 
people outside hospital 19 20 7 12 .70 <.30 





* Median test utilized because of heterogeneity of variance. 
t > values are for one-tailed tests. 


130 





Louis J. Moran, George W. Fairweather, and Robert B. Morton 


Table 4 


Comparison of Irregular and Regular Discharge Groups on the Adaptiveness of 


Situations External to the Hospital 
































Regular Irregular 
Above Below Above Below 
Scale Median Median Median Median xe" pt 
Adaptation concerning: 

patients’ employability 18 21 5 14 2.04 <.10 
pressure from others to continue 

treatment 16 12 4 8 1.92 <.10 
current economic status 14 14 6 6 — — 
other family responsibilities 14 14 6 6 _— — 
* Median test utilized because of heterogeneity of variance. 


t p values for one-tailed tests. 
for both groups. 


with ward personnel were also significantly 
less adaptive. 

The third hypothesis stated: The type of 
situations external to the hospital are less 
conducive to continued hospital treatment for 
the irregular than for the regular discharge 
patients. Reference to Table 4 indicates that 
this hypothesis was not supported in any of 
the four selected classes of situations. 


Discussion 


When the results of this investigation are 
viewed collectively it would appear that hos- 
pitalization is but the most recent of a long 
series of life situations in which the irregular 
discharge patient has demonstrated his in- 
ability to make an adequate adjustment. 

In an earlier study, the present authors 
were critical of the “personality determinism” 
type of approach typically used in experimen- 
tal studies of the irregular discharge prob- 
lem. At that time we argued, “It seems 
equally reasonable to assume that other fac- 
tors, entirely independent of personality, may 
be just as important correlates of such be- 
havior as the personality that the patient 
‘brings with him’ into his particular hospital 
environment” (2, p. 66). 

Insofar as extra-personality determinants 
were measured in the present study, however, 
they did not serve to differentiate the irregu- 
lar and regular discharge groups. The results 
of the present study would suggest instead 
that the best predictor (or determinant) of 


No ? value was ascertained for scales where frequency above and below the median was equal 


response to hospitalization is past behavior in 
a variety of situations, with specific situa- 
tional differences in the conditions of hos- 
pitalization playing a less important role than 
we formerly might have hypothesized. 

Differences between the regular and irregu- 
lar discharge groups in past life adjustment 
are striking. In grade school the irregular dis- 
charge group was significantly more truant, 
had more difficulty with peers and teachers. 
In adolescence they had fewer friends, fewer 
social interests, and quit attending church 
earlier. In the military, they were more fre- 
quently disciplined, they more often disliked 
officers and they more frequently failed to 
gain in rank. Occupationally, they changed 
jobs more frequently because they were “tired 
of the job” or were dismissed; they held more 
menial jobs, were less well-paid, and were 
more frequently unemployed. In most of the 
prehospital behavior areas measured the mean 
of the irregular group is in the direction of 
maladjustment, and in over half of the areas 
significantly so. 

This maladjustive pattern, as reflected in 
the gross examination of past behavior, is 
magnified by direct observation of behavior 
in the hospital. Here the behavior of this 
group, as independently rated by four observ- 
ers long before voluntary discharge against 
medical advice, was in the direction of mal- 
adjustment on every variable measured. The 
maladaptive nature of their behavior in the 
hospital was most evident in their disregard 
of regulations designed to protect their own 

















ER Mi eg PS 





Determinants of Treatment of Tuberculosis 131 


health and the health of others, and in their 
friction with hospital personnel. 

One further point about the nature of the 
irregular discharge problem should be men- 
tioned. A small group of individuals, prob- 
ably no more than 10 or 15 per cent of all 
people hospitalized for tuberculosis, appear to 
constitute the entire irregular discharge prob- 
lem. This small group, through repeated ir- 
regular discharges, builds up the reported an- 
nual “irregular discharge rate” to almost 50 
per cent. For example, the nineteen patients 
in the present sample have already con- 
tributed an average of four irregular dis- 
charges. Thus, a relatively small group of 
maladjusted individuals seem to provide the 
source for the technically accurate but some- 
what misleading statement that almost half 
of all hospitalized tuberculous patients leave 
the hospital against medical advice. 


Summary 


This study examined some relationships of 
the tuberculous patient’s (a) family of ori- 
entation, (6) prehospital life adjustment, (c) 
verbalized attitudes toward situations in the 
hospital, (d) behaviors in the hospital, and 
(e) current outside-the-hospital situations, to 
his success or lack of success in remaining in 
the hospital until his treatment was com- 
pleted. Information in the above five areas 
was collected on 140 currently hospitalized 
patients. Within nine months after examina- 
tion, 19 patients voluntarily left the hospital 
against medical advice and 39 completed their 
hospital treatment. Comparison of these two 
groups disclosed no differences in family of 
orientation, verbalized attitudes toward the 


hospital, or in current outside-the-hospital 
situations. Significant differences between the 
groups appeared, however, in prehospital life 
adjustment and in behavior within the hos- 
pital. It was concluded that hospitalization is 
but the most recent of a long series of life 
situations in which the irregular discharge 
patient has demonstrated his inability to 
make an adequate adjustment. 


Received June 27, 1955. 
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The Use of The W-B Picture Arrangement Subtest 
as a Projective Technique’ 


Boris Breiger 
Chicago State Hospital 


The W-B Picture Arrangements (PA) was 
used as a projective technique in a cross-cul- 
tural study (1) comprising 30 U. S. Cauca- 
sians, 20 Nisei, and 10 German refugees. All 
Ss were male, except two refugees. Variables 
controlled were Full-scale IQ, college status, 
age, urban-rural residence, and bilingualism. 
After completing the entire W-B, the “Flirt” 
and “Taxi” sequences were re-presented ac- 
cording to the Ss’ original arrangements. Ss 
were asked to entitle the episodes and tell the 
story, card by card. A TAT-type inquiry fol- 
lowed. 

Although no significant group differences 
were found on PA weighted scores or on raw 
scores of all sequences, a content analysis of 
“Flirt” and “Taxi” revealed marked differ- 
ences. Categorization of “Flirt” and “Taxi” 
into projected motivational factors reveal chi- 
square values significant at the .01 and .05 
levels respectively. Eighty per cent of the 
Caucasians projected a flippant story on 
“Flirt,” entitling it “Pick up,” while 65 per 
cent of the Nisei denied any romantic impli- 
cation and entitled it “Chivalry.” Nisei per- 
ceive the person in “Taxi” as being anxious 
in response to a member of his family “ob- 


1 Presented at the 1953 meeting of the Midwestern 
Psychological Association, Chicago, IIl. 

2An extended report of this study may be ob- 
tained without charge from Boris Breiger, Chicago 
State Hospital, Chicago 34, Ill, or for a fee from 
the American Documentation Institute. To obtain it 
from the latter source, order Document No. 4749 
from ADI Auxiliary Publications Project, Photo- 
duplication Service, Library of Congress, Washing- 
ton 25, D. C., remitting in advance $1.25 for micro- 
film or $1.25 for photocopies. Make checks payable 
to Chief, Photoduplication Service, Library of Con- 
gress. 


serving’ him, while most Caucasians make no 
reference to any specific external source of 
anxiety. Significantly more Caucasians than 
Nisei (.01 level) projected abnormal sex be- 
havior into “Taxi.” Refugee stories followed 
U. S. Caucasian patterns. 

Subsequent use of this technique with hos- 
pitalized patients aided in the diagnosis of 
well-defended persons who appeared intellec- 
tually intact upon standard W-B testing. 
Such persons frequently project stories ex- 
pressing sex role confusion, prostitution fan- 
tasies, and paranoid ideation. 

The present work suggests that an inter- 
pretation of the PA test based solely upon 
weighted scoring may be misleading. The 
“correct” arrangement may cover up percep- 
tual distortions and other pathological proc- 
esses. Failure to verify predicted relation- 
ships between W-B and Rorschach factors 
having a common rationale (2) may be due 
to overlooking the projective functioning of 
certain W-B stimuli. Awareness of these im- 
plications offers a lead for research seeking 
to integrate psychometric and projective test 
data. 


Brief Report. 
Received November 1, 1955. 
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Validity of the Marsh-Hilliard-Liechti MMPI 
Sexual Deviation Scale in a State 
Hospital Population 


Roland M. Peek and Lowell H. Storms 


Hastings, Minnesota, State Hospital 


In a recent article in this Journal, Marsh, 
Hilliard, and Liechti (1) reported an MMPI 
scale of 100 items which served to pick up 88 
per cent of two samples of sexual psycho- 
paths at the expense of only 11 per cent false 
positives among UCLA students. However, a 
check on the utility of the scale for discrimi- 
nating sex offenders from neurotics and psy- 
chotics revealed overlap so great as to sug- 
gest ineffectiveness, and the authors say that 
perhaps “some factor of personality integra- 
tion or adjustment is being measured by the 
scale” (1, p. 58). This view is supported by 
their findings that high scorers on the scale 
obtained significantly higher F and lower K 
values. The present study was designed to in- 
vestigate the effectiveness of this scale in a 
hospital setting by (a) evaluating its power 
to discriminate sex offenders from nonoffend- 
ers in small samples from the same hospital 
population, (6) comparing the scores of both 
groups with a nonhospitalized nonoffender 
group, and (c) comparing scores with other 
MMPI variables to test further the hypothe- 
sis that this scale is primarily a measure of 
gross personality integration. The present 
situation is less artificial than that of com- 
paring college students with already referred 
sexual psychopaths, since the present experi- 
mental design more closely approximates the 
actual clinical situation and types of subjects 
with which the scale would be utilized. 


Procedure 


The Marsh-Hilliard-Liechti scale was scored 
on available MMPI’s for three groups of sub- 
jects: (a) a group of 13 sex offenders com- 


prising all those testable male patients in the 
Hastings (Minn.) State Hospital whose com- 
mitments were primarily for severe sexual mis- 
behavior (four molesters, two rapists, two un- 
controlled homosexuals, three exhibitionists, a 
fetishist, and a voyeur). Their mean age was 
39.7 with a range of 16 to 62. Nine carried 
diagnoses of some form of psychosis, whil 
four were personality or character disorder 
Nine were single. (b) Thirty unselected mal 
patients representing all those taking 
MMPI consecutively during the period of th 
study and whose ages fell between 16 and ¢ 
excluding the sex offenders, of course. Ag 
ranged from 19 to 52 with a mean of 36.8 
In this group were 22 psychotics, three neu- 
rotics, and five personality and character dis- 
orders. Ten of the 30 were single, a signifi- 
cantly smaller proportion than in the above 
group (p < .05). (c) Thirty male psychiatric 
aides who took the MMPI as part of a rou- 
tine battery administered soon after employ- 
ment. Ages ranged from 20 to 56 with a mean 
of 31.3. Sixteen of the 29 whose marital status 
was known were single. 

Mean score and overlap comparisons be- 
tween the group of sex offenders and the 
patient and psychiatric aide groups were 
obtained, as well as comparisons with the 
original standardization groups (1). In the 
patient group, correlations were computed be- 
tween the sexual deviation scale and 
other MMPI scale. 


each 


Results 


Mean scores on the sexual deviation scale 
were 43.31 (SD = 7.88) for the sex offenders, 
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Table 1 


Discrimination of Sex Offender, State Hospital, and ‘““Normal’”’ Groups on the Sexual Deviation Scale, 
Using Original and Modified Cutting Scores 


(Entries are percentages) 





Sex offenders 


Hastings 





State 

Cutting M-H-L (1) Hospital 
score (N =338) (N = 13) 
>30 88 85 
=30 12 15 
Total 100 100 
>42 41* 77 
342 59* 23 
Total 100 100 

* Interpolated from Marsh ef al., Table 2 (1). 


38.77 (SD = 6.65) for the patients, and 29.70 
(SD = 8.61) for the psychiatric aides. 

Table 1 presents percentages of the Marsh 
et al. samples and the present groups falling 
above and below the original cutting score of 
30 and the cutting score of 42 found to dis- 
criminate optimally between hospitalized sex 
offenders and other patients. The original cut- 
ting score, reported as picking up 88 per cent 
of California sexual psychopaths, similarly 
discriminates 85 per cent of hospitalized sex 
offenders in the present study. While this dis- 
crimination is attained at the expense of mis- 
classifying only 11 per cent of the college stu- 
dents, 87 per cent of our patients (almost 
complete overlap) and almost half of the psy- 
chiatric aides are misclassified as sex devi- 
ates. The higher cutting score of 42 reduces 
the false positives among patients to 33 per 
cent and the false positives among college 
students and psychiatric aides to 1 per cent 
and 7 per cent respectively. However, this 
score, which achieves maximal differentiation 
in the hospital population, correctly identify- 
ing 77 per cent of the hospitalized sex offend- 
ers, selects only 41 per cent of the original 
group of sexual psychopaths. 

Using the chi-square test, differentiations 
were significant at the 1% level between both 
sex offender groups and both “normal” groups. 
The discrimination between our sex offenders 
and patients was also significant at the 1% 
level, but only using the new cutting score of 


i] 
1} 





Unselected 


State College 
Hospital Psychiatric students 
patients aides M-H-L (1) 
(N =30) (N =30) (N =317) 

87 43 11 
13 57 89 
100 100 100 
33 7 1 
67 93 99 
100 100 100 


42. All other comparisons were not significant, 
including three of the four comparisons of 
sex offenders with patients. 

Correlations of the sexual deviation scale 
with MMPI scales computed for the 30 un- 
selected patients are presented in Table 2. 
It can be seen that this scale correlates sig- 
nificantly with a number of MMPI scales, the 
strongest relationships being with the validity 
scales F and K (about .60, p < .001). Other 
reasonably high correlations (.50 to .53, p 
< .01) were found with Mf, Ma, and Sc. The 


Table 2 


Correlations of Sexual Deviation Scale with Other 
MMPI Scales in 30 State Hospital Patients 











Correlation Significance 
with Sexual of corre- 
MMPI Deviation lation 
Scale Scale (¢ test) 
L — 30 ns 
F 60 001 
K — 59 001 
Hs 37 5 
D 27 ns 
Hy .23 ns 
Pd A2 05 
Mf 53 01 
Pa 28 ns 
Pi 39 OS 
Se i 01 
Ma 52 01 
Si .20 ns 














Validity of the MMPI Sexual Deviation Scale 


lower correlations with Pd, Pt, and Hs were 
significant at the 5% level. 


Discussion 


Our results confirm the incidental findings 
of Marsh e¢ al. that their “sexual deviation” 
scale does not discriminate sex offenders from 
neurotics and psychotics, throwing more doubt 
upon their main conclusions than they indi- 
cated. The mean score of our patients was 
almost as high as the means of the sex of- 
fender groups and considerably above the re- 
ported cutting point; the mean of our “nor- 
mal” group of psychiatric aides was almost at 
the cutting point they recommend. This cut- 
ting point fails completely to discriminate 
sex offenders from unselected state hospital 
patients, and even classifies two-fifths of the 
psychiatric aides with the committed sex of- 
fenders. The optimal cutting point for our 
groups provides better discrimination but the 
overlap remains unsatisfactorily large. Apply- 
ing the base-rate analysis of Meehl and Rosen 
(2), the use of this scale to increase predic- 
tive success becomes still more questionable. 
Assuming a liberal 90 per cent discrimination 
of sex offenders as against 11 per cent false 
positives in a group to be delineated from 
them, with a base rate of 2 per cent for sex 
offenders in a given population (our rate is 
1.6 per cent of the patients on the hospital 
rolls at the present time), 86 per cent of 
the persons classified as sex deviates by the 
scale will be false positives, actually not sex 
offenders. 

Although computed on a small N, the cor- 
relations are consistent with the hypothesis 
that the “sexual deviation” scale is a measure 
of personality disintegration and general ab- 
normality. The fact that the validity scales 
F and K correlate most strongly with the 
“sexual deviation” scale and in positive and 
negative directions respectively shows that 
higher scores on the sexual deviation scale 
are associated in this abnormal group with 
greater disorganization and lower defensive- 
ness. This is interesting in view of the fact 
that item overlaps between this scale and the 
MMPI scales are very low (for example, 
three items overlap with F; seven overlap 
with K, but three of these are scored in the 
opposite direction). 
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Having confirmed the suspicion of Marsh 
et al. that their scale might be a measure of 
gross adjustment, our results indicate further 
that it is not a measure of sexual deviation in 
any broad application. A measure which dif- 
ferentiates a special group from another group 
may measure not only what the members 
of the special group have in common, e.g., 
sexual deviation, but may measure what that 
group has in common with any number of 
other groups, in this case general abnormal- 
ity. It would be absurd, for instance, to call 
the Miller Analogies Test a Theoretical Physi- 
cist Scale because it discriminates theoretical 
physicists from high school graduates. 


Summary 


This study was performed to ascertain the 
validity of the Marsh, Hilliard, and Liechti 
Sexual Deviation Scale for the MMPI in 
groups of considerably different composition 
from their standardization samples and to 
check their incidental suggestion that the scale 
might be a measure of general personality 
integration. 

Groups of 13 sex offenders, 30 psychiatric 
patients, and 30 psychiatric aides with avail- 
able MMPI’s were scored on the Sexual 
Deviation Scale, and correlations between 
that scale and the other MMPI scales were 
obtained for the group of patients. Compari- 
sons were made among the present groups and 
between them and the original standardization 
groups using the above authors’ cutting score 
as well as another which was found optimal 
for the present study. 

The following results were obtained: 

1. Neither cutting score differentiated ef- 
fectively between patients and sex offenders; 
in fact, just as high a percentage of patients 
as of either group of sex offenders exceeded 
the cutting score recommended by Marsh 
et al. 

2. The original cutting score failed to dis- 
criminate psychiatric aides from sex offenders 
satisfactorily, although the new cutting score 
was a little more successful. 

3. Correlations of the scale with other 
MMPI scales support the hypothesis that the 
scale measures gross maladjustment or lack 
of personality integration. The strongest re- 
lationships are with the validity scales F and 
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K, and are positive and negative, respectively. 
High scorers on the “sexual deviation” scale 
tend to be more unstable, uncontrolled, con- 
fused, and prone to reality distortion. 

It was concluded that the scale has little 
utility for the purpose for which it was de- 
signed and would fail to select sex offenders 
from many populations, since most people 
with high scores would have other than sexual 
problems. This scale seems to function more 
as a measure of general abnormality than of 
deviant sexual trends. 

The inefficiency of such a scale in predict- 





Roland M. Peek and Lowell H. Storms 


ing behavior which is so rare in almost any 
population as are sex offenses was also dis- 
cussed. 


Received July 25, 1955. 
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Cognitive, Affective, and Psychopathological 
Correlates of the Taylor Manifest 
Anxiety Scale 
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Though the Taylor Manifest Anxiety Scale 
(MAS) (21) is widely used as a measure of 
manifest anxiety, it has never been satisfac- 
torily validated. Some attempts at validation 
have yielded positive results (4, 9), but others 
have yielded equivocal (10) or negative (2) 
findings. Moreover, recent research (11) sug- 
gests that a low score may not necessarily in- 
dicate lack of manifest anxiety, but rather 
S’s reluctance to admit it. 

On the positive side are the many experi- 
ments which confirm predictions made on the 
assumption that the scale is a measure of 
manifest anxiety. In some (16) of these ex- 
periments it was found that much manifest 
anxiety, as indicated by a high Taylor score, 
enhances the performance on simple learning 
tasks but interferes with complex learning 
tasks. However, experimenters (7, 11) who 
obtained negative correlations between Taylor 
scores and scores on intelligence tests raised 
the question whether differences found in per- 
formance on learning tasks as a function of 
high and low Taylor scores might not really 
be a function of differences in intelligence. 
In most of these experiments, however, the 
intelligence test consisted of a limited num- 
ber of tasks. Moreover, some of the tasks were 


1From the Clinical Psychology Section of the 
Neuropsychiatric Service at the Veterans Adminis- 
tration Hospital, Bronx, N. Y. The author wishes to 
express his appreciation to Drs. H. L. Flowers, Chief 
of Neuropsychiatry and R. S. Morrow, Chief of 
Clinical Psychology for their sustained interest and 
support, and to Dr. Julia C. Hall, Assistant Chief 
Clinical Psychologist in charge of research, for her 
encouragement and help at every stage of this re- 
search. 


actually learning tasks, and others time 
limitations. In both cases a negative correla 
tion is to be expected (16, 13, 19). In the 
present experiment Taylor scores were cor 
related with scores on the Wechsler Adult I: 
telligence Scale which includes a wide variety 
of cognitive tasks, timed as well as untin 
and only a few learning tasks 


had 


Procedure 

Several validation procedures were at- 
tempted. In the first, the Taylor MAS scores 
of 90 patients were correlated with their 
scores on a manifest anxiety rating scale 
(MARS). In the MARS, each patient was as- 
signed by his psychiatrist or psy 
score of 1, 2, or 3, 
medium, and high, on nine criteria of mani- 
fest anxiety. These criteria are essentially the 
same as those described by Buss et al. (4 
All the patients were in contact, and of at 
least normal intelligence. None was suspected 
of having cortical damage. In a further 
tempt to validate the Taylor MAS, the Taylor 
scores of patients who received a diagnosis of 
anxiety neurosis were compared with the Tay 
lor scores of patients with other neurotic di 
noses but who were free from symptomatolog) 
of manifest anxiety. The Taylor scores of pa 
tients with other psychiatric diagnoses 
ticularly the psychopathic personalities 
are supposed to have very little manifest 
anxiety (1), were also of interest. In 
case the diagnosis was agreed upon by 
patient’s psychiatrist and consulting psy- 
chologist, and all diagnostic “problem cases 


| 1 
hologist 


corresponding to low, 
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were excluded. In yet another but more in- 
direct attempt at validating the scale, the 
Taylor scores of all 90 patients were corre- 
lated with their scores on a self-esteem scale 
(20). Since a variety of dynamically oriented 
theorists (14, pp. 182, 191) maintain that 
anxiety is a function of threat to one’s self- 
esteem, a negative correlation between the 
two scales was anticipated. 

To investigate the hypothesis that reluc- 
tance to reveal one’s anxiety symptoms to 
others is a significant source of variance in 
performance on the Taylor MAS, the scale 
was administered under two different condi- 
tions. One group of 23 psychiatric but non- 
psychotic patients took the test as part of a 
diagnostic testing procedure, which they know 
has serious consequences. Another similar 
group of 33 patients took the scale in a 
group. They were told that since E was solely 
interested in the efficacy of the scale, and 
how people respond when they are free to 
be frank, they were not to indicate their 
names. Both groups were equated for age, IQ, 
and diagnosis. If reluctance to reveal one’s 
anxiety symptoms depresses one’s Taylor 
score, then the first group should obtain sig- 
nificantly lower scores than the second group. 

Finally, in order to investigate the relation- 
ship of Taylor MAS scores with IQ, the Tay- 
lor scores of 35 medical and psychiatric pa- 
tients, in whom brain damage and psychosis 
were definitely ruled out, were correlated with 
their WAIS IQ score. Their Taylor scores 
were also studied in relation to their scores on 
each of the eleven subtests, unadjusted as well 
as adjusted for IQ. Adjustment for IQ was 
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Table 1 


Mean Taylor MAS Scores in Relation 
to Psychopathology 


Psychopathology N Mean SD 
Anxiety reaction 14 35.00 8.02 
Other neuroses 18 22.89 12.04 
Latent schizophrenia 17 22.53 12.17 
Paranoid schizophrenia 15 26.27 11.44 
Psychopathic personality 10 13.83 5.29 


achieved by subtracting S’s weighted score on 
a particular subtest from his mean weighted 
score on the remaining subtests. 


Results 


The correlation between the patients’ scores 
on the MARS and their Taylor scores was 
r = .34, which for 89 df is significant beyond 
the .01 confidence level. 

Table 1 indicates that of the 90 patients, 
14 received a diagnosis of anxiety neurosis, 
18 received other neurotic diagnoses, includ- 
ing character neurosis, but were free of mani- 
fest anxiety symptomatology, 14 were schizo- 
phrenics, 17 were latent schizophrenics (18) 
and 10 received a diagnosis of psychopathic 
personality, which does not include the other 
character disorders (1). Of the remaining 16 
patients, 5 had diagnoses which were not fre- 
quent enough to be grouped, 6 had neurotic 
diagnoses other than anxiety reaction but 
with manifest anxiety symptoms, and 5 were 
diagnostic “problem cases.” Table 2 indicates 
that the patients with a diagnosis of anxiety 
reaction had significantly higher scores than 


Table 2 


t Values of the Differences Between Mean Taylor MAS Scores of the Various 


Psychopathological Groups 











Psychopathological groups t df p 

Anxiety reaction vs. other neuroses 3.24 30 <.01 
Anxiety reaction vs. latent schizophrenia 3.21 29 <.01 
Anxiety reaction vs. paranoid schizophrenia 2.37 27 <.05 
Psychopathic personality vs. anxiety reaction 6.97 22 <.01 
Psychopathic personality vs. other neuroses 2.24 26 <.05 
Psychopathic personality vs. latent schizophrenia 2.07 25 <.05 
Psychopathic personality vs. paranoid schizophrenia 3.19 23 <.01 
Paranoid schizophrenia vs. latent schizophrenia 88 30 

Paranoid schizophrenia vs. other neuroses 82 31 

















Correlates of the Taylor Manifest Anxiety Scale 


Table 3 


Mean Taylor MAS Scores in Relation to Test 
Identification and Test Anonymity 


Group N Mean SD t 
Test identification 23 27.30 10.65 
Test anonymity 33 31.64 7.96 1.82 


any other group, and that the patients with a 
diagnosis of psychopathic personality had sig- 
nificantly lower scores than any other group. 

The correlation between the patients’ Tay- 
lor scores and their self-esteem scores was 
r = — .72, which for 89 df is highly significant. 

Table 3 indicates that Ss who took the 
Taylor MAS in a group and did not identify 
themselves failed to obtain significantly 
higher scores than Ss who took the scale as 
part of a diagnostic testing procedure. This 
finding strongly suggests that reluctance to 
admit one’s anxiety symptoms is not a sig- 
nificant variable. 

Finally, there was no significant correla- 
tion between Taylor MAS scores and WAIS 
IQ scores (ry = — .19, epsilon-square = .16). 
Furthermore, none of the correlations (r’s and 
etas) between the Taylor MAS and any single 
WAIS subtest, adjusted as well as unad- 
justed for IQ, was significantly different from 
zero. 


Discussion 


Though all three validating procedures 
yielded positive results, the correlation be- 
tween the Taylor MAS and the MARS, the 
most direct criterion of manifest anxiety used 
in this study, is rather low. This finding is 
consistent with previous research which has 
had no difficulty in obtaining significant cor- 
relates of the Taylor MAS, but was much less 
successful in correlating the scale with direct 
criteria of manifest anxiety. In terms of a 
recent article by Cronbach and Meehl (5), 
these findings suggest that though criterion 
validation of the Taylor MAS yields ques- 
tionable results, it does appear to have con- 
siderable construct validity. 

There may be several reasons for the low 
correlation between the Taylor MAS and the 
MARS. One possible explanation is that the 
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Taylor score represents the S’s evaluation of 
his own symptoms, and it is conceivable that 
two Ss who are rated alike in manifest anx- 
iety, one will be more sensitive to his own 
psychological experiences and symptoms than 
the other. This hypothesis is supported by 
the following facts: (a) previous investigators 
have reported higher correlations between 
the Taylor MAS and the Ss’ reports of their 
own manifest anxiety symptoms than be- 
tween the Taylor MAS and the same but ob- 
jectively observed symptoms (4), (6) the 
average MMPI profile of high Taylor scorers 
suggests that they are “introspective, quite 
sensitive to environmental press, and willing 
to admit to being easily disturbed. The low 
scorers, conversely, utilize both denial and 
repression fairly frequently and rarely intro- 
spect” (3, p. 435), and (c) several investi- 
gators (3, 6) have reported significant cor- 
relations between the Taylor MAS and the 
Pt scale of the MMPI, whose 
“unreasonable 


items refer to 
fears as well as overreaction 
to more reasonable stimuli” (8, p. 20). An- 
other possible explanation for the low cor- 
relation between the Taylor scale and the 
MARS is that the Taylor scale is loaded with 
chronic somatic manifestations of anxiety 
(headaches, stomach troubles, etc.) and thus 
it is biased in the direction of a particular 
manifestation of anxiety. In order to test this 
hypothesis, the Tayior MAS was submitted 
to five experienced clinical psychologists who 
were asked to indicate which items reflected 
chronic somatic manifestations of anxiety and 
which reflected primarily psychic manifesta- 
tions of anxiety. A chi-square test was then 
used to determine which items of the Taylor 
MAS differentiated at the .10 level or better 
between those Ss who fell in the upper third 
and those Ss who fell in the lower third of 
the MARS scores distribution. Whereas 12 
items were judged as referring to chronic 
somatic manifestations of anxiety, only 1 such 
item (number 10) differentiated between the 
two groups at the .10 level (Table 4). Fur- 
thermore the correlation between the two sets 
of items with 59 df was only r = .65. These 
findings tend to support the hypothesis that 
the many items referring to chronic somatic 
symptoms are an undesired source of variance. 

The absence of significant correlations be- 





140 Aron W. Siegman 


Table 4 


Taylor MAS Items Which Differentiated at the .10 
Confidence Level or Better Between a Group 
Which was Rated High and a Group 
Which was Rated Low in 

Manifest Anxiety 











Confidence 
Item level 
1. I am no more nervous than most others. .10 
2. I work under a great deal of strain. 1 
3. I worry over money and business. 02 
4. I frequently notice my hand shakes when 
I try to do something. 02 
5. I worry quite a bit over possible troubles. 01 
6. Iam oftenafraidthatIamgoingtoblush.  .01 
7. [have nightmares every few nights. 10 
8. At times I lose sleep over worry. O01 
9. I often dream about things I don’t like to 
tell other people. 10 
10. I have a great deal of stomach trouble. 10 
11. Iam easily embarrassed. 02 
12. I feel anxious about something or someone 
almost all of the time. .001 
13. [am happy most of the time. 05 
. It makes me nervous to have to wait. 10 
15. At times I have worried beyond reason 
about something that really did not 
matter. 05 
16. I have been afraid of things or people 
that I know could not hurt me. O5 
17. Sometimes I become so excited that I 
find it hard to get to sleep. .10 
18. I am the kind of person who takes things 
hard. 10 
19. I am a very nervous person. O01 
20. Life is often a strain for me. O1 
21. I am not at all confident of myself. 10 


22. I am very confident of myself. 10 








tween the Taylor MAS and any one of the 
eleven subtests of the WAIS raises an inter- 
esting problem, since many of these subtests 
are considered in clinical practice (15) as re- 
liable indices of manifest anxiety. In light of 
the fact that there is substantial experimental 


evidence (12, 17) for this clinical judgment, 
the finding is a challenge to the validity of 
the scale. However, two possible explanations 
may be offered. One is that the Taylor MAS 
is a measure of S’s general level of manifest 
anxiety, but in no way indicates how S re- 
sponded to a particular anxiety-stimulating 
task. Secondly, in clinical experience one sel- 
dom if ever relies on only one subtest, and it 
is conceivable that the Taylor scores are re- 
lated to performance on a combination of 
anxiety-sensitive subtests. In order to test 
this hypothesis the subtests were divided into 
anxiety tests, which include arithmetic, digit 
span, digit symbol, block design, and object 
assembly (15, 12, 17) and nonanxiety tests 
which include information, comprehension, 
similarities, vocabulary, and picture comple- 
tion. 

Table 5 indicates that the group which ob- 
tained high Taylor scores (36 and higher, 
which represents the 75th percentile and 
above) also obtained significantly lower scores 
on the anxiety tests than the group which 
obtained low Taylor scores (17 and less, 
which represents the 25th percentile and be- 
low). This finding is consistent with good 
clinical practice not to rely on a single sub- 
test. 


Summary 


A validational study of the Taylor MAS 
found that: 

1. Taylor MAS scores correlated r = .34 
with a manifest anxiety rating scale which 
for 89 df is significant beyond the .01 con- 
fidence level. It was hypothesized that the 
low correlation is due to the fact that the 
Taylor MAS contains too many items re- 
ferring to chronic manifestations of anxiety. 
This hypothesis was supported by an item 


Table 5 


Mean Scores on Anxiety and Nonanxiety Subtests in Relation to Taylor MAS Scores 














Mean Mean 
anxiety nonanxiety 
Group N tests SD tests SD it 
High Taylor scorers 10 10.00 1.50 11.60 2.20 2.86* 
Low Taylor scorers 10 10.75 1.98 11.77 2.38 1.33 





* Significant at the .02 confidence level. 
+ Fisher's ¢ for correlated means. 
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analysis of the scale. Individual differences in 
self-awareness and sensitivity to one’s symp- 
toms seems to be another source of variance 
of the Taylor scale. 

2. Patients with a diagnosis of anxiety re- 
action had significantly higher Taylor MAS 
scores than all other diagnostic groups, 
whereas patients with a diagnosis of psycho- 
pathic personality had significantly lower 
scores than all other diagnostic groups. 

3. Taylor MAS scores correlated — .72 with 
scores on a self-esteem scale which for 59 df 
is significant beyond the .01 confidence level. 
It was suggested that the Taylor MAS has 
considerable more construct validity than 
criterion validity. 

4. Patients who did not identify their Tay- 
lor MAS records did not obtain significantly 
higher scores than patients who did identify 
their records. 

5. Though there was no significant corre- 
lation between Taylor MAS scores and WAIS 
IQ scores, high Taylor scorers did obtain sig- 
nificantly lower scores on the anxiety-sensi- 
tive subtests than on the nonanxiety-sensitive 
subtests, whereas no such difference was noted 
for the low Taylor scorers. 


Received July 6, 1955. 
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Reliability of Case History Ratings and 
Intellectual Ability of Graduate Raters 


A. W. Bendig 


University of Pittsburgh 


Two studies have related educational level 
of raters to several measures of the reliability 
of ratings of clinical case histories (1, 2). 
Three rating measures were used: rater reli- 
ability, which is the average of the N(N — 
1) /2 intercorrelations among the N raters in 
a group; psychiatric reliability, the average 
of the N correlations of each single rater’s 
judgments with the pooled judgments of psy- 
chiatrists of the same clinical material; and 
rater bias, which is a measure of the vari- 
ability among the N raters in the mean rat- 
ing each rater assigns all of the case histories 
he judges. Bendig and Sprague (2) found un- 
dergraduate raters to be lower in rater reli- 
ability and higher in rater bias than graduate 
students in psychology and practicing clini- 
cal psychologists. Bendig (1) used a dif- 
ferent set of case histories and reported no 
significant differences between groups of un- 
dergraduate and graduate Ss in rater and 
psychiatric reliability, but again the under- 
graduate raters demonstrated a significantly 
greater variability in bias measures (1, p. 
129). 

These differences in rater bias can be as- 
sumed to be the result of the educational dif- 
ferences between the Ss, but whether less bias 
is due to greater training in psychology, 
higher intellectual level, or increased interest 
in clinical work with patients among gradu- 
ate raters is unknown. 


Procedure 


Case histories. The rated stimuli were the 
10 abstracted clinical case histories used in 
a previous study (1). The cases varied in 
length from one to two pages of single-spaced 
typing and had been abstracted from com- 


plete case histories in clinic files. The clinical 
subjects of the cases were all white males and 
varied in adjustment level from relatively 
normal to hospitalized schizophrenic. The ab- 
stracted cases were mimeographed as a book- 
let with a face sheet containing rater instruc- 
tions, the rating scale to be used, and spaces 
for recording S’s judgments. The scale was a 
seven-category scale with the center and end 
categories verbally anchored by the descrip- 
tive phrases “slightly maladjusted,” “moder- 
ately maladjusted,” and “extremely malad- 
justed.” The intermediate scale categories 
were left unanchored as was done in previous 
use of this scale (1, 2). 

Subjects. The raters were 40 graduate stu- 
dents in psychology, ranging in graduate 
training from first- to fourth-year students. 
Of this group 25 had expressed their major 
area of professional interest as clinical or 
counseling psychology, with the remaining 15 
having their major interest in other areas, 
i.e., experimental, comparative, social, psy- 
chometrics, etc. Most of the Ss (N = 27) had 
taken the three-test graduate school entrance 
battery described in detail by Jenson (5) and 
these scores were available from the files of 
the University Testing Service. The case his- 
tory booklets and rating scales were distrib- 
uted to the Ss in their graduate classes and 
the completed ratings collected at a subse- 
quent class period. 

Rater variables. The 27 Ss who had taken 
the graduate test battery were dichotomized 
on the basis of their raw scores on four intel- 
lectual variables: the Miller Analogies test, 
a Mathematical Aptitude test, a Reading 
Comprehension test, and a measure of Re- 
search Aptitude. This last variable had been 
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Table 1 


Reliability of Case History Ratings of Graduate Psychology Students Dichotomized on Six Rater Variables 





Rater Psychiatric 
reliability Rater bias reliability 
Rater a 
Rater variable group N r t r t r t 
Miller Analogies High 13 a 1.38 .69** RY pes 86** 69 
Low 14 .86** A4* g9** 
Mathematical High 14 .82** 1.23 54* 78 87** 53 
aptitude Low 13 .86** 61** 88** 
Reading High 14 84** 31 .64** 04 87** 30 
comprehension Low 13 .85** 55* 88** 
Researcu «aptitude High 15 85** 62 ” 1.60 89** 86 
Low 2 -_" 43 R6** 
Graduate credits High 20 87** 1.69 7 3.62** 39** 1.15 
Low 20 si 49* 86** 
Clinical interest High 25 _ 85 63** 01 88** 38 
Low 15 _ .63** 87** 


* Significant at the .05 level. 
** Significant at the .01 level. 


developed by Hyman (4) through factor ana- 
lytic procedures as a predictor of combined 
criteria of research ability and consisted of 
a linear combination of subscales from the 
Reading Comprehension test. The total group 
of 40 Ss was also dichotomized on the basis 
of the total number of graduate credits earned 
in psychology and on the basis of their ex- 
pressed clinical interest. 


Results 


The Ss were dichotomized separately on 
each of the six rater variables and three reli- 
ability measures computed for each pair of 
groups. Two of these measures, rater reli- 
ability and rater bias, were obtained by 
analyses of variance of the ratings (1, 2). 
Rater reliability is the average intercorrela- 
tion among the raters in the ratings assigned 
to the stimuli, while rater bias is a measure 
of the individual differences among the raters 
within a group to consistently overrate or 
underrate all of the cases. The third measure, 
psychiatric reliability, was obtained by cor- 
relating each S’s ratings with the pooled 
judgments of six psychiatrists who had also 
evaluated the 10 cases (1). The average cor- 


relation of each group of Ss with the psychia- 
trists was obtained by finding the mean of 
the transformed (r-to-z) correlations within 
that group (3, pp. 133-134). 

The pairs of rater reliability and rater bias 
measures for each dichotomized group of 
raters were tested for significant differences 
by the procedure given by Edwards (3, | 
136), using as the degrees of freedom for 
each transformed coefficient one less than the 
number of degrees of freedom of the erro: 
term in the above noted analyses of variance 
Pairs of psychiatric reliability coefficients were 
tested by similar procedures, using as the de- 
grees of freedom for each averaged correla- 
tion the sum of the df for each S (df = 7 for 
one S$). The results of these comparisons can 
be found in Table 1. 

None of the dichotomized group differences 
in rater or psychiatric reliability was signifi- 
cant at the .05 level of confidence and only 
one of these twelve comparisons, that between 
Ss high and low in the number of earned 
graduate credits, was significant at the .10 
level (¢ = 1.69). Consequently it seems rea- 
sonable to conclude that intellectual and in- 
terest differences between these graduate Ss 
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had no effect upon their reliability in judging 
clinical case histories. However, two of the 
six comparisons on rater bias were significant 
at the .01 level. The Ss with high scores on 
the Miller Analogies test showed significantly 
more rater bias (¢ = 2.77) and Ss having 
earned more graduate credits also showed 
more bias (¢ = 3.62). The significant differ- 
ences between Ss dichotomized on the basis 
of these two rater variables cannot be ex- 
plained by a relation between Miller Analogies 
test scores and number of graduate credits 
since the product-moment correlation between 
these two variables for 27 Ss was only — .05. 


Discussion 


Apparently intellectual, training, and inter- 
est differences among graduate students have 
no effect upon the reliability of their ratings 
of clinical case histories, but do affect meas- 
ures of rater bias. One S may rank order the 
10 cases about the same as another S, but the 
first rater spreads the cases along the low or 
adjusted end of the rating scale continuum 
while the second rater distributes the same 
case histories along the high or maladjusted 
portion of the scale. Raters with high Miller 
Analogies scores tended to vary more among 
themselves in the central tendency of their 
ratings than do Ss with low scores. Individual 
differences in the portion of the scale used 
for rating are also greater for more educa- 
tionally advanced graduate students. This 
greater rater bias is possibly a function of 
the divergent connotations of the global con- 
cepts of “adjustment” among more intellec- 
tually able and more academically experienced 
graduate Ss. For one rater the reported case 
behavior has to be quite deviant before he is 
willing to judge it as maladjusted, while the 
same behavior is judged much more strin- 
gently by another rater. Although these two 


rater variables, one intellectual and one edu- 
cational, are related to the absolute adjust- 
ment level ratings given by graduate Ss to 
case histories, they are not related to the 
relative ratings assigned by the raters. 


Summary 


Graduate students in psychology (N = 40) 
rated ten abstracted clinical case histories for 
global adjustment level using a seven-point 
rating scale. The raters were then dichoto- 
mized on the basis of their scores on graduate 
school entrance tests, the number of earned 
graduate credits in psychology, and their ex- 
pressed interest in clinical psychology. Meas- 
ures of rater reliability and bias were com- 
puted for each of the dichotomized groups 
and the measures of reliability and bias com- 
pared by ¢ tests. None of the intellectual, edu- 
cational, or interest variables were related to 
rater reliability, but raters high on the Miller 
Analogies Test and with a greater number of 
earned graduate credits showed significantly 
(.01 level) greater rater bias. 


Received August 8, 1955. 
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Body Cathexis as a Factor in Somatic Complaints*’ 


Laverne C. Johnson 


Washington University School of Medicine and Research Laboratories of 
Malcolm Bliss Hospital 


In recent work Secord (7) and Secord and 
Jourard (8) have focused attention upon atti- 
tude toward the body as an important vari- 
able in any comprehensive theory of person- 
ality development. They, like Murphy (6), 
conceive of the body as an anchorage point 
for the more inclusive concept of the self and 
by use of objective measures of body cathexis 
(BC) and self cathexis (SC) have found a 
significant relationship between the two. Body 
cathexis was defined by Secord and Jourard 
as the degree of feeling of satisfaction or dis- 
satisfaction with the various parts or proc- 
esses of the body. This definition will be used 
in this paper. In addition to cross validation 
of the relationship of body and self cathexis, 
this paper: (a) presents information concern- 
ing the stability of attitude toward the body 
and self, and (4) investigates the relationship 
of body cathexis to somatic complaints. 

Even though Freud related hypochondriasis 
to an excessive attraction of interest onto the 
sensations and functions of bodily organs, 
few studies have been reported relating atti- 
tude toward the body to somatic complaints. 
Levy (3) investigated body interest and hy- 
pochondriasis in 20 children, but he was more 
interested in the types of body concern and 
causes of excessive body interest than he was 
in comparing these 20 children to a control 
group. 

Secord and Jourard (8) utilized a homonym 
test composed of words having meanings per- 


1 This study was supported in part by a research 
grant from the National Institute of Mental Health 
of the National Institutes of Health, United States 
Public Health Service. 

2 Part of these data was gathered under Air Force 
Contract Number 18(600)-927 with the Department 
of Clinical Psychology, United States Air Force 
School of Aviation Medicine. 


taining to the body and also meanings not re- 
lated to the body to study this question. The 
subject responded with the first word that 
came to him after each oral presentation of 
the homonym. A significant relationship be- 
tween the number of responses having bodily 
meanings and BC was found. These results, 
Secord and Jourard believe, confirmed their 
hypothesis “that low BC is associated with 
anxiety in the form of undue autistic concern 
with pain, disease, or bodily injury” (8, p 
343). While these results indicate a relation- 
ship between BC and bodily concern when 
measured by a word-association technique, 
the findings do not answer the question 
whether BC will be related to the actual 
number of somatic symptoms experienced and 
reported. Will dissatisfaction with and, pos- 
sibly, concern over body appearance and body 
functioning be related to the number of 
somatic symptoms reported on a health ques- 
tionnaire? The question is investigated in the 
present study. 


Subjects 


Fifty-two males and ninety-five females 
with mean ages of 20.2 and 20.3 comprise the 
sample studied. As part of a larger research 
project,® the fifty-two males were drawn from 
the freshman class of a local Protestant semi- 
nary. These students were selected from the 
total class, N = 183, solely on the basis of 
their electroencephalographic records. None 
of the EEG records of the subjects chosen 
would be classified as abnormal. The ninety- 
five females were student nurses receiving 
their psychiatric training at Malcolm Bliss 
Hospital. 


8 Air Force Contract Number 18(600)-—927, 
Psychiatric Screening of Flying Personnel. 


The 
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Procedures 


Each subject was administered the BC and 
SC scales and the Cornell Medical Index 
Health Questionnaire. The male subjects were 
administered the tests as part of the more 
extensive battery when they came to our hos- 
pital laboratories. 


The Tests 


The BC and SC scales were very similar to 
those used by Secord and Jourard (8). In- 
structions for the scales were changed to re- 
duce the tendency to response sets. The fol- 
lowing instructions appeared on the cover 
page: 


On the following pages are listed a number of 
things characteristic of yourself or related to you. 
You are asked to indicate which things you worry 
about and would like to change if it were possible, 
and which things you have no feelings about one 
way or the other. 

Consider each item listed below and encircle the 
number which best represents your feelings about 
yourself now according to the following scale: 1. 
Strongly dislike and wish change could somehow be 
made; 2. Don’t like, but can put up with; 3. Have 
no particular feelings one way or the other; 4. Defi- 
nitely like, am pleased with; 5. Consider myself par- 
ticularly and unusually fortunate. So that you will 
be able to judge each item carefully in terms of the 
above five statements, the scale will be at the top 
of each page. You may refer back to the scale as 
often as necessary to make your judgment of how 
you feel. Judge each item carefully. Do not use the 
same number for each item. 


The items and scoring procedure were es- 
sentially those used by Secord and Jourard. 
Organs pertaining to sexual functions were 
not excluded as was done by Secord and 
Jourard (8). 

These instructions were very successful in 
reducing response sets. The criteria used in 
the original study to measure response set 
were as follows: (a) a frequency > 32 in 
category 4; (b) a frequency > 28 in cate- 
gory 5; and (c) a frequency > 24 in cate- 
gory 5 accompanied by less than two re- 
sponses in categories 1 and 2 combined. 
Applying these criteria to the present sam- 
ple, only 12, 8 men and 4 women, of the 147 
subjects would be suspected of response sets. 
When compared to the 37 of the 126 subjects 
eliminated by Secord and Jourard, response 
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set does not seem to be a problem in this 
sample using these instructions. 

The Cornell Medical Index Health Ques- 
tionnaire consists of a series of questions 
which the subject is to answer yes or no. As 
the name implies, the items are concerned 
with symptoms and diseases past and present. 
The particular form used in this study is that 
used by the Air Force (1). With the excep- 
tion of containing fewer questions, 171 in 
contrast to 195, the test used here is identi- 
cal to the published Cornell Medical Index 
Health Questionnaire. Both men and women 
took the same form. The score on the Health 
Questionnaire consisted of the number of 
symptoms or diseases reported as indicated 
by the sum of the yes responses. 


Results 


The three scores obtained from the BC-SC 
scales were: total BC, total SC and a score 
called an “anxiety indicator” by Secord and 
Jourard. The first two scores were a total of 
the ratings for each item on a 5-point scale 
for each individual divided by 50 and 52 re- 
spectively to get the average BC and SC rat- 
ing. A low score on the BC or SC scale indi- 
cates that the individual has a low attitude 
toward his body or self and wishes change 
could be made. A high BC or SC score on 
the other hand indicates the individual is 
pleased with his body or self. A low rating 
then is indicative of negative cathexis toward 
the body or self and a high score is indicative 
of positive cathexis. The “anxiety indicator” 
score was obtained by summing the ratings 
for each male on the 13 most negatively 
cathected items by the males and the 14 most 
negatively cathected items for the females. 
The sums were then divided by 13 and 14 to 
get an average rating. 


Items most negatively cathected by the males 
were: strength, body build, muscles, sex activities, 
weight, waist, energy level, width of shoulders, hips, 
teeth, facial complexion, posture and nose. The most 
negatively cathected items for the females were: 
hair, facial complexion, nose, waist, body build, ap- 
pearance, hips, legs, teeth, feet, knees, posture, face, 
weight. These items are almost identical to those re- 
ported by Secord and Jourard, and as in Levy’s (3) 
study the items most negatively cathected indicate 
concern over body appearance rather than body 
functioning. Also, as reported by Levy, women are 
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primarily concerned with appearance while men in 
addition to appearance are concerned with physical 
prowess. 


As Taylor Manifest Anxiety scores were 
available as part of the total test battery, it 
was possible to obtain the correlation be- 
tween the Taylor scale scores and the most 
negatively cathected items, the “anxiety in- 
dicator” score. Results indicate an inverse re- 
lationship between “anxiety indicator” scores 
and manifest anxiety. The correlations were 
— .40 for men and — .53 for women. Low 
attitude toward body appearance is signifi- 
cantly associated, beyond the .01 level, with 
manifest anxiety. The difference between the 
correlations for men and women was not sig- 
nificant. 


Reliability and Relations of BC and SC 


The split-half reliability reported by Secord 
and Jourard for the BC and SC scales indi- 
cated the internal consistency of the scale 
but gave no indication as to the stability of 
attitude toward the body and self over a pe- 
riod of time. Retesting of the 52 male sub- 
jects after a 6 to 8 week interval made pos- 
sible the obtaining of the test-retest coeffi- 
cients of reliability. These were .72 for BC, 
.74 for SC, and .76 for the “anxiety indi- 


Table 1 


Means, Standard Deviations, and Comparison of 
Difference Between Means for Sexes 
(N=52 Males; 95 Females) 





Test Mean SD CR 

Body cathexis 

Males 3.44 5 

Females 3.27 33 1S* 
Self cathexis 

Males 3.39 53 ’ 

Females 3.38 37 NS 
Anxiety indicator 

Males 3.19 67 = 

Females 2.86 53 3-06" 
Cornell Medical Index 

Males 88.02¢ 44.36 3.53% 

Females 112.74¢ 32.28 ’ 





* Significant beyond .05 level. 
** Significant beyond .01 level. 
t Log transformed scores. 


Table 2 


Intercorrelations Between BC Scores and 
Cornell Medical Index 








Body Anxiety 
Test cathexis indicator 
Cornell Medical Index 
Males — 33* a 
Females AO** 54** 
* Significant beyond .05 level 
** Significant beyond .01 level. 


cator’’ scale. These figures indicate acceptable 
test-retest reliability for the three measures 

The relationship of BC to SC as indicated 
by Pearson product-moment correlations was 
.66 for male and .79 for females. These cor- 
relations support the findings by Secord and 
Jourard that attitude toward the body is re- 
lated to attitude toward the self. 


Sex Differences 


The means, standard deviations, and com 
parison of difference between means for th« 
two sexes are listed in Table 1. As the dis- 
tribution of scores on the Cornell Medical 
Index was skewed with the variability being 
almost equal to the mean, these scores were 
converted to log scores and these trans- 
formed scores were used for all statistical 
analysis (5). 

The student nurses’ attitude toward their 
bodies was significantly lower than that of 
the seminary students on the total BC scale 
and on the “anxiety indicator” scale. Differ- 
ence between the sexes on the latter which 
consists of items pertaining primarily to body 
appearance was particularly striking. When 
cutting scores of 2.8 or below and 4.0 or 
above are used for dividing the two groups, 
61 of the women, 64%, in contrast to 22, 
42%, of the men have scores below 2.8. Only 
2 of the women, 2%, have scores of 4.0 or 
above while 7, 13%, of the men have scores 
in this range. Thus 64% of the women, in 
contrast to 42% of the men, state they defi- 
nitely dislike the appearance of their body 
and only 2%, in contrast to 13%, state they 
are pleased with their appearance. Signifi- 
cantly more symptoms were admitted to by 
the student nurses on the Cornell Medical 
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Table 3 


Intercorrelations of Somatic Items on Cornell 
Medical Index and BC 











Body Anxiety 
Test cathexis indicator 
Somatic items 
Males — .24 —.31* 
Females — .30** —.44** 





* Significant beyond .05 level. 
** Significant beyond .01 level. 


Index, but there was no difference between 
the sexes on the SC scale. 


Relationship of BC to Somatic Symptoms 


The correlations between BC and the Cor- 
nell Medical Index for both sexes are listed 
in Table 2. 

All correlations are significant beyond .05 
level, and all but one are significant beyond 
the .01 level indicating in this sample a 
moderate inverse relationship between atti- 
tude toward the body and number of symp- 
toms reported. While the correlations for 
women are higher than those for men the dif- 
ferences are not significant. For both sexes 
there is a higher relationship between the 
“anxiety indicator” scores and number of 
symptoms reported than when the total BC 
score is used. This difference is significant for 
women at the .05 level when checked by z 
transformation using the technique for differ- 
ence between correlated z2’s (4, p. 148). The 
difference for the men is not significant. 

As the Cornell Medical Index is composed 
of items pertaining to both somatic and 
psychological symptoms the question arose 
whether the intercorrelations were a result of 
psychological complaints rather than somatic 
ones. To check this the Cornell Medical In- 
dex was rescored for somatic symptoms only 
and these scores were correlated with BC. 
These results are summarized in Table 3. 
While the correlations between BC and only 
somatic symptoms are lower, all are signifi- 
cant with the exception of total BC and 
somatic items for the men. This reaches the 
09 level. 
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Discussion 


This study cross validates Secord’s and 
Jourard’s finding that attitude toward the 
body is a significant factor in attitude to- 
ward the self. Zachry (10) believes that one 
of the most important tasks of adolescence is 
the acceptance of the body “as a symbol of 
self.” These results suggest that the attitude 
toward one’s body is associated with accept- 
ance of self, not only in adolescence, but also 
in young adults, and perhaps at all ages. 

Results indicating a moderate relationship 
between attitude toward the body and somatic 
complaints renews interest in Freud’s theory 
of hypochondriasis. In his chapter on narcis- 
sism Freud states: “The hypochondriac with- 
draws both interest and libido—the latter 
especially markedly—from the objects of the 
outer world and concentrates both upon the 
organ which engaged his attention” (2, p. 
40). Freud believed that this withdrawal 
from external objects and excessive attrac- 
tion of interest onto the sensations and func- 
tions of body organs was due to a damming 
back of the libido into the internal organs. 
This paper was not designed specifically to 
test Freudian theory, and the subjects have 
not been clinically diagnosed as hypochon- 
driac, but the results support the hypothesis 
that attitude toward the body is associated 
with somatic complaints. 

The significantly lower BC scores for 
women suggest the attractive hypothesis that 
because of the social importance of the fe- 
male body women are more critical of their 
bodies and are likely to be more concerned 
over body appearance and functioning than 
are men. Jourard and Secord (9) investigated 
BC and the ideal female figure and found 
that women’s satisfaction with aspects of 
their bodies varies with the magnitude of the 
deviation between measured size and what 
they consider ideal size. They also found that 
many women feel attainment of “ideal” pro- 
portions is difficult, if not impossible. It cer- 
tainly seems plausible then that, insofar as 
this “ideal” is internalized by women, fail- 
ure to attain it would result in low body 
cathexis. The small percentage of both men 
and women who stated they were pleased 








with the appearance of their bodies indicates 
how difficult it is to measure up to the 
“ideal.” 

However, certain factors suggest caution in 
accepting the above view completely. Our re- 
sults are not compatible with Secord and 
Jourard (8) with regard to sex differences. 
They found no difference between male and 
female on either the body cathexis scale or 
homonym test of body concern. 

When the BC-SC scores of this study were 
compared with those of Secord and Jourard, 
no difference was found on the SC scores for 
either sex or for men on the BC scale. How- 
ever, a significant difference, at the .008 prob- 
ability level, was found between the student 
nurses and college women on total BC. The 
difference between the two female groups was 
not significant on items pertaining primarily 
to appearance, “anxiety indicator” scale. 
Three factors may be related to these differ- 
ences between the sexes and between the 
women of the two studies: age, intellectual 
level, and training. 

Age was almost identical for the sexes in 
this study and is similar to that of college 
sophomores and thus safely can be ruled out 
as an important variable contributing to the 
differences. Intellectual level cannot be as 
safely excluded but as the mean IQ of the 
student nurses was 112, they appear to be 
not too dissimilar, intellectually, from stu- 
dents in many liberal arts colleges. Perhaps 
the most significant difference between the 
sexes, and the women of this study and those 
of Secord and Jourard’s, is the training and 
duties of the student nurse. Levy (3) noted 
that one factor causing excessive body con- 
cern and interest in his sample of children 
was exposure to illness or being exposed to 
frequent discussions of physical ills. It is a 
common phenomenon for medical students to 
be stricken with various ailments and become 
concerned with the functioning of their bodies 
when they begin their clinical studies. Per- 
haps a similar phenomenon is present in the 
student nurses and their low scores on total 
BC and the larger number of symptoms re- 
ported is due in part to their exposure to 
and interest in illness. 
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Summary 


The purposes of this study were (a) to 
cross validate the findings of Secord and 
Jourard that attitude toward the body is re- 
lated to attitude toward the self; (4) ascer- 
tain the stability of BC and SC; and (c) to 
investigate the relationship of BC to somatic 
complaints. Fifty-two male seminary students 
and ninety-five student nurses were adminis- 
tered the BC and SC scales and the Cornell 
Medical Index. A significant relationship was 
found between BC and SC, cross validating 
Secord’s and Jourard’s findings. Attitude to- 
ward the body and toward the self was found 
to be stable over a period of time. A moder- 
ate inverse relationship was found between 
attitude toward the body and number of so- 
matic symptoms reported. Women had lower 
BC scores than men and reported a larger 
number of symptoms. The social importance 
of the female body and the training and in- 
terest of the student nurses in illness are of- 
fered as possible explanations. 
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Stability of the WISC and Binet Tests 


Ila H. Gehman 


State Hospital at Butner, N. C2 


and Rudolph P. Matyas 


Pennsylvania State University 


The present study explored the stability of 
the Wechsler Intelligence Scale for Children 
and the Revised Stanford-Binet Scale, Form 
L, as measures of intelligence over a four-year 
time interval. No information has been made 
generally available to show the long-term re- 
liability of the WISC. The 1937 edition of the 
Binet has not been extensively investigated 
with general populations in this respect, but 
Matyas’ (3) survey of the literature sug- 
gested that an r of approximately .73 was 
typical of test-retest correlations when the in- 
tervals between tests were of such duration 
as to minimize practice effects. 

It was hypothesized that intelligence quo- 
tients derived from the WISC would be found 
to be as constant upon retest after a period of 
four years as would those derived from the 
Binet. The examiners were also interested in 
checking upon the relative efficiency of ad- 
ministration of the two scales, and kept ac- 
count of the time required to give each test. 


Subjects and Procedures 


Sixty pupils in the ninth grade of a junior 
high school in central Pennsylvania partici- 
pated in this study. They were selected on the 
basis that their scores on the WISC and 
Stanford-Binet, Form L, were available from 
research by Clarke (1) completed in May, 
1950. The sample in Clarke’s study included 
85 children who comprised the entire popula- 
tion of three fifth grade classes in the school 
district. Their mean age was 11 years, 1 
month. At the time of the present investiga- 
tion, 60 of the original population were avail- 


1 Formerly Associate Professor 
Pennsylvania State University. 


of Psychology, 


able for examination and their mean age was 
15 years, 2 months. Twenty-nine of the 60 
pupils were boys and 31 were girls. 

The WISC and Binet were administered 
during the spring of 1954 by three graduate 
students who had completed courses in indi- 
vidual testing taught by the senior author. 
Three other examiners who had also been in- 
structed by this same person examined these 
children when they were in fifth grade in 
1950. Testing conditions in the two investiga- 
tions were similar and it was generally pos- 
sible to alternate the administration of the 
two tests. Data for the 60 pupils from the 
original sample of 85 in the fifth grade were 
recomputed in order to provide a basis for 
direct comparisons of identical subjects for 
the two age and grade levels. 


Results and Discussion 


Table 1 describes the composition of the 
group with respect to IQ’s obtained on first 
and second examinations. The means indicate 
a population with average general intelligence. 

In the fifth grade none of the means was 
found to be significantly different from any 
of the others at the 5% level of confidence, 
but in the ninth grade the mean WISC Per- 
formance IQ was significantly greater than 
the WISC Verbal IQ. Sixty-seven per cent of 
the ninth grade pupils had higher perform- 
ance than verbal IQ’s. The difference of 6.51 
IQ points is almost identical with a finding 
from Delattre and Cole (2) working with 
children of ages 10-5 to 15-7 who tended to 
be above average in intelligence. These dis- 
crepancies suggest a possibility that the verbal 
and performance scales are not of equal diffi- 
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Table 1 


Grade 5, 1950 
Mean Age 11-1 


Test Mean IQ SD 
Binet L 96.17 14.29 
WISC Verbal 96.90 10.84 
WISC Performanc« 99.87 13.19 
WISC Full Scale 98.13 11.20 


culty at the older age levels. However, among 
other speculations concerning the cause of the 
discrepancies it is possible that the perform- 
ance test is more sensitive to “coaching,” 
when intensive testing is done in one school 
system, than is true of the verbal scale. 

While there was a slight tendency for the 
IQ’s to rise from grade five to grade nine 
(except in the WISC Verbal), none of the ¢ 
tests reached significance at the 5% level. 
Thus the mean IQ’s were stable for both the 
Binet and all scales of the WISC over the 
four-year period. 

The relatively small sigmas would tend to 
reduce the correlations between the first and 
second testings. This restricted variability is 
largely a function of the selection of stu- 
dents from specific grade levels. Even so, the 
correlations were all found to be high and 
significant as is shown in Table 2. The sta- 
bility coefficients were markedly similar with 
none of the correlations differing from any 
other as was determined by transforming the 
r’s to z scores and testing for significance of 
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Grade 9, 1954 
Mean Age 15-2 
Mean IQ SD LD) ‘ 
98.25 14.38 2.08 79 
96.72 11.14 1% 06 
103.23 14.44 4.36 140 
99 82 12.79 1.69 7¢ 
Table 2 
Correlation Bet [0)’ | ( 
Fif 4 N } {5 4 
Te 
Binet L 
WISC Verba 
WISC Performance 
WISC Full S 
difference. The WISC equalled the Binet 
dependability for predicting itself 
four-year interval 
The data were further analyzed t 
intercorrelations of the various ales and 
see if these differed at the two grade levels 


Table 3 shows that the abilities or « 
teristics measured by the Binet and WIS 
are ordered in essentially the same way 
both grade levels. The largest discre 


tween correlations was found in the case 


Table 3 


Intercorrelations of IQ’s and i Tests for Differenc« 
(Fifth Grade Followed 


WISC 
Perform- 
ance 
44 
50 








WISC Verbal 


WISC Performance 


WISC Full Scale 





the WISC Performance and the Binet wit 
the higher correlation occurring in the nint 
by Ninth 
WISC 
Full Bin 
Scale For 
R3 ia 7R . 
86 7¢ . 
86 “ 4é ; 
R4 64 :, 
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grade. However, none of the differences be- 
tween correlations was significant at the 5% 
level of confidence. 

With regard to administration, Clarke (1) 
reported that the mean time for giving the 
Binet to the fifth grade was 76.8 minutes and 
for the WISC Full Scale 64.05 minutes, or a 
difference of 12.7 minutes. For the ninth 
grade the mean administration time was 69.2 
minutes for the Binet and 55.9 minutes for 
the WISC, the difference being 13.3 minutes. 
At both levels, then, the WISC Full Scale 
averaged about 13 minutes less testing time 
than did the Binet. 


Summary 


Results of Binet and WISC administrations 
were compared, these tests having been given 
to 60 boys and girls in fifth grade and four 
years later to the same children in ninth 
grade. Both tests yielded IQ’s that were rela- 
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tively constant and showe7 equal stability 
over this interval of time as determined by 
comparisons between mean scores, test-retest 
correlations, and intercorrelations of major 
parts. On the average it required 13 minutes 
less to give a full WISC than to give a Binet 
at both grade levels. 
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Intolerance of Ambiguity and Ethnocentrism 


Ronald Taft 


University of Western Australia 


In connection with a study on the auto- 
kinetic effect, a group of first-year psychology 
students at the University of Western Aus- 
tralia was observed in both individual and 
group situations.’ Since these subjects had al- 
ready been given a test of ethnic prejudice, 
an opportunity presented itself to test in an- 
other culture the previous findings in the 
U.S.A. on the relationship between prejudice 
and intolerance of ambiguity. 

Following the work on this relationship by 
Frenkel-Brunswik (6), Block and Block (3) 
demonstrated that persons high on ethno- 
centrism were quick to develop personal 
norms in the extent of their perceived auto- 
kinetic movement. That is, in this unstruc- 
tured situation, high ethnocentrics showed 
little tolerance for ambiguity. Similar findings 
using other material have been obtained by 
Fisher (5), O’Connor (7), and Siegel (8). 
Furthermore, Barron (1, 2) found that high 
ethnocentrics prefer simple rather than com- 
plex geometrical figures, and since persons 
who yield to social pressure also show this 
preference, we hypothesized that in a group 
autokinetic situation high ethnocentrics would 
tend to conform to their partner’s judgments. 


Method 


The subjects consisted of 36 volunteers (21 
females and 15 males). The measure of ethno- 
centrism was a Bogardus-type scale referring 
to the following groups: American Negroes, 
Chinese, English, Irish, Italians, Jews, Poles. 
Those whose average degree of ethnic aversion 
was greater than “Would admit to my social 
group as personal friends” were regarded as 
prejudiced, and all of the others as unpreju- 


1The author thanks Mrs. Enid Cook for making 
available the data on the autokinetic effect. 


diced. There were 11 and 25 Ss respectively 
in each group. Among the subjects used in 
this study prejudice is mostly not extreme 
and only five of the “prejudiced” subjects 
would have excluded more than two of the 
ethnic groups from citizenship in Australia. 
On the other hand, there were 13 extremely 
unprejudiced subjects who stated that they 
would be willing to accept members of nearly 
all groups even as marriage partners. 

Autokinetic phenomenon. Each S was tested 
individually on the darkroom autokinetic ef- 
fect. In the initial session they made 20 esti- 
mates of movement in the presence of E, a 
female graduate assistant. Unlike the cus- 
tomary procedure, the instructions included a 
short explanation of the autokinetic illusion; 
the Ss were told that some people tend to see 
“a great deal of movement” and some tend to 
see “only a little” but almost everyone does 
tend to see movement of some sort. The fact 
that the Ss were told that this movement was 
an illusion distinguishes this study from that 
described by Block and Block (3). 

Approximately four weeks later, each S was 
tested similarly in the presence of some other 
S, estimates being made orally. Pairings were 
made of like sexes arranged so that persons 
seeing considerable movement in the prelimi- 
nary sessions were paired with those seeing 
little movement. 

Scoring was carried out as follows: (a) In- 
tolerance of ambiguity. Whereas Block and 
Block used a rating method of determining 
when norms were established, we used an ob- 
jective measure, the mean deviation. Because 
of the large range in the individual estimates, 
the relative variation was not a suitable meas- 
ure. Since the instructions had informed the 
Ss that there was actually no movement, the 
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mean level of the estimates was used as a 
second measure of intolerance of ambiguity— 
the closer the estimates were to zero, the 
greater the intolerance. These indices were 
calculated for only the first ten judgments in 
the individual situation, in order to focus on 
S’s first reactions to the unstructured situa- 
tion. (6) Conformity to partner. This was 
determined simply by the direction of S’s 
mean judgment in the group session in rela- 
tion to his previous mean and that of his 
partner. 


Results 


Intolerance of ambiguity. The more highly 
prejudiced Ss made estimates that were closer 
to zero than did the unprejudiced. The mean 
estimates were 2.3 and 4.9 inches, respec- 
tively; using x? with Yates’s correction the 
difference was not statistically significant (p 
was actually .10). The mean MD’s of each 
individual’s set of estimates were .8 and 1.0, 
respectively, also nonsignificant with a p of 
.10. However, when we combine these two 
indices we obtain results that are significant 
at the .05 level. Thus only one of the 11 
prejudiced Ss, compared with 12 out of the 
25 unprejudiced, recorded both mean esti- 
mates and mean deviations above the median 
for the group. We seem justified therefore in 
concluding that the prejudiced Ss are less able 
to tolerate ambiguity than the unprejudiced. 
A more detailed analysis of the results pro- 
vides no indication that the relationship is 
other than linear between the very unpreju- 
diced and the very prejudiced Ss. 

Conformity to social pressure. Results on 
the conformity index were available for only 
21 Ss (7 prejudiced and 14 unprejudiced), 
but the trend was a definite one. The un- 
prejudiced Ss tended to move away from 
rather than toward their partners. Only 6 out 
of 14 unprejudiced Ss converged compared 
with 6 out of 7 of the prejudiced Ss. This 
finding lends further support to the expecta- 
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tion that prejudiced persons are more de- 
pendent on their group, especially in an un- 
structured situation. Once more this trend 
was a linear one. 


Conclusions 


Further evidence is provided that those 
subjects who are comparatively higher on 
ethnocentrism are also more inclined to adopt 
an anchoring point quickly in an unstruc- 
tured situation. The anchoring points used in 
the above experiment were zero movement, a 
steady personal norm, and the norms of one’s 
partner. These results support Dana’s gen- 
eralized dictum “Ethnic prejudice is congruent 
with minimal resistance to the environment in 
which stereotypes are accepted, thinking is 
eschewed for conformity and security of ab- 
solutes” (4, p. 14). 
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A WISC Profile for Retarded Readers 


Grace T. Altus 


Santa Barbara County Schools 


This investigation posed the question: Is 
there a distinctive test pattern associated 
with the scores on the Wechsler Intelligence 
Scale for Children (WISC) for children with 
severe reading disabilities? 

A severe reading disability was here de- 
fined as a discrepancy of two years or more 
between a given child’s expected reading level 
as derived from his Full Scale WISC IQ and 
his actual reading level as measured by a 
standardized reading test. All children in the 
sample had been referred to the Guidance 
Department of the Santa Barbara County 
Schools by their teachers because of severe 
academic disabilities. All had Full WISC IQ’s 
of 80 or more, spoke only English at home, 
had taken at least four subtests on each 
WISC scale, and were between third and 
eighth grade when given the reading test. 

A group of 25 children from 12 elementary 
schools met these criteria. Twenty-four of 
them were boys—an exaggerated representa- 
tion of the usual finding that boys outnumber 
girls as reading problems. The intelligence of 
the group was normal. Mean WISC IQ’s were 
97.8, 100.4 and 98.6 on the Verbal, Perform- 
ance, and Full Scales respectively. There was 
less variability in the sample than in an un- 
selected population as shown by standard 
deviations of 9.9, 10.3 and 9.2 IQ points on 
the same scales. 

Since the mean Verbal-Performance IQ dis- 
crepancy was negligible, it was ciearly not 
differentially diagnostic. However, the subtest 
patterning appears to be fairly distinctive, as 
shown in Figure 1. Coding and Arithmetic 
are significantly lower than Vocabulary, Digit 
Span, Picture Completion, Object Assembly, 
and Picture Arrangement at the .01 level of 


confidence. The Information subtest is sig- 
nificantly lower than Picture Completion at 
the .01 level and lower than Vocabulary and 
Digit Span at the .02 level of confidence. Had 
the positive correlations among the various 
subtests been taken into account in computing 
the significance of differences between subtest 
means, the chances of true differences would 
have been increased and Similarities would 





probably also have been included in the “low” 
subtests. 
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Fig. 1. Mean WISC subtest scaled scores of 
retarded readers. 


The obtained WISC pattern is strikingly 
similar to the findings reported by Altus (1) 
regarding the differential validity of Wechsler- 
Bellevue subtests in predicting graduation of 
trainees from a camp for illiterate soldiers. 
Arithmetic, Information, and Digit Symbol 
(Coding) subtests were there shown to be 
highly effective in predicting graduation, i.e., 
achievement of technical literacy. 

Thus the same subtests which are here sig- 
nificantly lower for children with severe read- 
ing disabilities were also lower for adult sol- 
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diers who failed to read at a specified level 
after instruction in reading. The consistency 
of the findings suggests that the profile herein 
described, while based on a small number of 
cases, may be reasonably characteristic. 


Received August 17, 1955. 
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Test Reviews 


Wechsler, David. Wechsler Adult Intelligence Scale 
(WAIS). New York: Psychological Corporation, 
1955. 


Reviewed by 


Roy Schafer 
Yale University Medical School 


The Wechsler-Bellevue Intelligence Scale (W-B) 
has had its face lifted. In form and substance, how- 
ever, it is still very much with us in the Wechsler 
Adult Intelligence Scale (WAIS). The WAIS revision 
contains no new types of subtest. Its structure, like 
that of the W-B, makes possible not only pattern 
analysis of the equated subtest scores, but also com- 
parison of Verbal and Performance IQ’s, analysis of 
the sequence of passes and failures within each rela- 
tively homogeneous subtest as item difficulty in- 
creases, and qualitative analysis of the attempted 
task solutions. 

As to whether the face-lifting has done much for 
the test, my impression is mixed; it is one of old 
wrinkles gone and new ones added and old charms 
gone and new ones added. Viewed generally, the 
WAIS remains our best clinical instrument for study- 
ing certain ego functions, such as judgment, con- 
centration, and concept formation, as these operate 
in relatively well-structured, impersonal task situa- 
tions. Clinical testers must study ego functions in 
such formalized and conventionalized situations in 
order to complete in each case their gross survey of 
personality organization and dynamics. Clinical ex- 
perience with a battery of tests, including the Ror- 
schach and TAT, repeatedly demonstrates that both 
versions of Wechsler’s intelligence test contribute 
basic data both for locating significant areas of 
strength and weakness in defensive and adaptive 
functioning and for estimating the degree of strength 
or pathological impairment in these areas. Especially 
in studying borderline and acute psychotic conditions 
are these contributions important. 

An outstanding improvement on the W-B is the 
all-new Vocabulary list containing mostly verbs, 
adjectives, and abstract or “literary” nouns (like 
compassion, calamity, and fortitude). The W-B Vo- 
cabulary contains more items (like hAarakiri and 
guillotine) overlapping Information. The WAIS Vo- 
cabulary more directly surveys the subject’s means 





of verbal self-expression and through this it better 
defines his interest in verbal organization and com- 
munication of experience. On the other hand Infor- 
mation now contains two items that overlap Com- 
prehension (“Why are dark clothes warmer than 
light-colored clothes?” and “How does yeast cause 
dough to rise?”); also Comprehension now contains 
three items that overlap tests of concept formation 
(three proverbs). Thus, not all the changes are in 
the direction of greater homogeneity within subtests 

Also commendable is the extension of the test’s 
range in three important respects. (a) In most of 
the subtests, lower levels of intellectual function may 
now be tapped and scored through the inclusion of 
intellectually primitive lead-off items. Less often, the 
upper levels have also been raised. (6b) The age 
range covered by the test has been extended upward 
by the establishment of norms through the ages 70- 
74 and even into the nebulous range of “75 years 
and over.” Unlike the W-B, the WAIS does not 
cover the age range 10 through 15; the Wechsler In 
telligence Scale for Children now does that job. In 
connection with matters of age, praise is due for the 
provision of tables whereby the weighted subtest 
scores may be, so to speak, corrected for age. This 
“correction” is achieved by comparing subtest per- 
formance with the average of the subject’s age peers 
rather than comparing it with the average of the 
standard reference group, aged 20-34. Diagnostic 
scatter analysis in the older age ranges especially 
may well be furthered by these supplementary tables. 
(c) The third important extension is that the stand- 
ardization population now includes Negroes and 
rural-agricultural persons and the sample was drawn 
from over the entire country. We may therefore 
apply the test with more confidence to our highly 
varied clinical populations. (It would be most inter- 
esting to see a breakdown of the data in terms of 
urban-rural, and Negro-White, as well as male-fe- 
male.) 

Most of the subtests have been lengthened. For 
clinicians this is a mixed blessing. On the one hand, 
greater length makes the test more time-consuming, 
and this increases the temptation to drop subtests or 
to use a short form of the test; on the other hand, 
greater length yields a broader sample of behavior 
in each subtest area and it increases score reliability. 
Increased score reliability, together with the WAIS’s 
higher correlations between subtests and between 
single subtests and total test score, means on the 
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average a narrowing of individual scatter of subtest 
scores. Consequently lesser amounts of scatter will 
now tend to be significant. However, this change 
may be more or less offset by the slight expansion 
of the weighted score scale (0-19). 

The goal of eliminating ambiguous W-B items (p. 
1) appears to account for the replacement of a 
number of items throughout the test. This revision 
too is a mixed blessing. A degree of ambiguity (or 
whatever makes for disruption of response—as in 
the frequent Information error that there are four 
pints in a quart) adds “projective” richness to in- 
telligence test results. Ambiguity highlights the way 
repressive uncertainty limits and undermines intel- 
lectual attainment; it pinpoints confusional tend- 
encies; it provokes obsessional rumination; it facili- 
tates anxiety-laden temporary inefficiencies. To the 
clinician, misunderstandings and inefficiencies are not 
random events; they point to significant patterns of 
functioning. We must remember in this regard that 
there is no one standard for choosing items to study 
intelligence. In our lives our “intelligence” must cope 
with tasks involving all degrees of structure, all de- 
grees of conventionality and familiarity, all degrees 
of complexity, all degrees of value-loading and con- 
flict-relevance. It is in this sense that the Rorschach 
test becomes in part an intelligence test too, though 
it yields no precise score. Wechsler’s psychometric 
approach to item selection almost necessarily favors 
as “well conceived” items that are easy and quick to 
administer and neatly scorable. But rounded clinical 
practice requires a description of more than IQ 
level; it concerns itself with intellectual potentiali- 
ties, ambitiousness, “style,” stress tolerance, and in- 
hibitions. These intelligence characteristics can be 
culled only from the results of a battery of tests, 
and only when all test results are viewed from the 
standpoint of contemporary dynamic personality 
theory. In any event, the WAIS seems to be a more 
efficient machine than the W-B. Because of this, it 
seems reasonable to expect that temporary ineffi- 
ciencies and other deviations of response will be 
even more diagnostically significant on the WAIS 
than the W-B. 

Throughout the presentation of the WAIS it re- 
mains clear that Wechsler’s approach to intelligence 
is limited by psychometric considerations. Dynamic 
considerations are underplayed. Concrete evidence 
for this is found in the crowded test blank which 
discourages verbatim recording and detailed qualita- 
tive notes. Similarly, the test manual completely 
ignores the many problems of technique so fre- 
quently encountered in clinical intelligence testing. 
What should we do when the subject too quickly 
gives up on items? How might we spot and follow 
up the implicit concreteness, misunderstandings, per- 
ceptual distortions, or bizarre conceptions often un- 
derlying responses? The WAIS manual even discour- 
ages clinical probing or testing the limits (p. 28). 
From the standpoint of getting an IQ that conforms 
neatly to the WAIS standardization, this discourag- 
ing attitude toward flexible and imaginative test ad- 
ministration is understandable. When, however, we 


use the test to help understand an individual, as well 
as to help estimate his general intelligence level, we 
need the standard administration as a firm guide 
rather than as a set of commandments; inevitably 
we must go beyond this guide, not rashly or fre- 
quently but with that sophisticated freedom that 
helps us study response processes and helps us 
clarify diagnostic and dynamic implications of logi- 
cal or emotional disruption of responses. For ex- 
ample, anyone who has routinely asked subjects for 
the stories they have in mind on the last two Pic- 
ture Arrangement items will know how often correct 
sequences are not backed up by a correct under- 
standing of the stories, and vice versa. 

While recognizing the practical difficulties in its 
way, I am sorry to see that no new type of subtest 
has been introduced. There is, for example, a crying 
need for a test of immediate memory for meaningful 
material; Digit Span and Digit Symbol are not satis- 
factory in this regard. It remains desirable, there- 
fore, to supplement the administration of the WAIS 
with a Story Recall test such as that in Babcock’s 
Test for measuring mental deterioration. Also un- 
changed—and unchallenged—is the view that the 
breakdown of the Full Scale IQ into “Verbal” and 
“Performance” components is adequate and satis- 
factory. Inspection of the tables of subtest correla- 
tions (pp. 15-17) makes it evident that Digit Span 
particularly and Arithmetic to a lesser extent do not 
cluster with the other four Verbal subtests. Also, 
performance on Digit Span and Arithmetic is rela- 
tively vulnerable to psychopathology. In deriving the 
Verbal IQ in clinical practice we therefore often get 
a better estimate of the patient’s general verbal level 
by omitting pathologically lowered Digit Span and/ 
or Arithmetic scores and then extrapolating. Another 
desirable change has not been made: the absence of 
part credits in the Block Design items continues to 
limit the sensitivity of scoring of that subtest. I 
wonder, by the way, why the colors on the blocks 
were changed. Such a change disrupts the continuity 
of our experience with test materials. I hope that 
the promised revision of The Measurement of Adult 
Intelligence will account for this change and for 
others that are not self-evident. 

Clinicians will find it easy to switch from the 
W-B to the WAIS. Although the content of a fair 
number of items has been changed, the type of item 
remains the same throughout. Certain changes should 
be stressed, however: (a) some of the subtest in- 
structions have been reformulated; (6) some of the 
criteria for scoring Information, Comprehension, and 
Similarities items have been altered; (c) the time 
limits and bonuses on performance items have been 
revised; (d) the IQ limits of the Defective and 
Very Superior ranges have been changed. 

I must say in conclusion that a definitive clinical 
assessment of the WAIS must await two develop- 
ments: (a) the appearance of Wechsler’s revision of 
his book, for at the present time Wechsler knows so 
much more than the rest of us about the phenome- 
nology of the revised test; (6) passage of enough 
time to allow the WAIS to be applied to a wide va- 
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riety of subjects in routine clinical practice and in 
research projects. This review is thus no more than 
a record of my initial orientation while switching 
from the W-B to the WAIS. A follow-up appraisal 
of the WAIS five years from now seems definitely 
in order. 


Wechsler, David. Wechsler Adult Intelligence Scale 
(WAIS). New York: Psychological Corporation, 
1955. 


Reviewed by 
Quinn McNemar 


Stanford University 


This revision (called WAIS) of the 1939 Wechsler- 
Bellevue Intelligence Scale retains the distinctive fea- 
tures of the original. About two-thirds of the earlier 
items have survived the rigors of item analysis and 
the over-all number of items has been increased from 
217 to 257, but this increase is spread so unevenly 
over the subtests (23 of the 40 added items are 
Digit Symbol items) that the effective increase in 
total scale length is less than 10 per cent. The Vo- 
cabulary test, consisting of entirely new items, is an 
integral part of the Verbal and Full Scales. 

The standardization sample of 1,700, ranging in 
age from 16 to 64, is claimed to be truly representa- 
tive. It is a stratified sample with controls on sex, 
geographic region, urban-rural residence, race (white 
vs. nonwhite), occupational level, and education. A 
flaw in the sampling procedure was the (admitted) 
inability to select randomly within the strata. The 
reviewer shares the author’s belief that any bias in- 
troduced by failure to follow the ideal random 
method is small. In other words, the reviewer re- 
gards the standardization sample as excellent, a 
marked improvement over that for the 1939 scale 
and over the norm groups for other scales of the 
same or earlier vintage. 

A supplementary group of 352 cases ranging in 
age from 60 to beyond 75 provides the basis for 
norms at older age levels. Two claims are made for 
this group: it represents a good sampling from a 
typical American city (Kansas City, Mo.) and the 
test results for the 101 cases in the 60-64 age 
bracket are very similar to those of the same age in 
the main sample. 

This time Wechsler has paid more attention to 
the problem of test reliability: split-half reliabilities 
are presented for each of the 11 subtests (except 
Digit Symbol) for each of three age groups. These 
reliabilities range from the .60’s to the .90’s, with a 
median value of .82. The reliabilities for Verbal, Per- 
formance, and Full Scale IQ’s are reported as .96, 
.93, and .97, respectively. The reviewer is somewhat 
skeptical of these three coefficients, and hazards the’ 
prediction that test-retest reliabilities will not be as 
high. 

Since WAIS is built on exactly the same base as 
its predecessor, it must be presumed that the prin- 
ciples underlying the old scale are being propagated. 


Stated differently, the author of WAIS shows no 
cognizance of the vast factor-analysis literature of 
the past 20 years. Regardless of whether one identi- 
fies with some sort of g school (typically British, 
and apparently adhered to by Wechsler) or with 
the primary ability (differential aptitude) approach, 
one cannot escape a lesson regarding test construc- 
tion which emerges forcefully from the factor litera- 
ture: the test constructor should strive for a pure 
measure of whatever he hopes to quantify. That is, 
a score should represent a point on a unidimensional 
scale rather than a hodgepodge of different dimen- 
sions. 

Consider so-called Verbal IQ’s obtainable on WAIS 
Two persons with the same total weighted score on 
five of the tests can readily differ by as much as 10 
points on the Digit Span or 6th test, thereby differ- 
ing by 10 points on the Verbal IQ scale. Apparently, 
the user of WAIS is expected to condone such pos- 
sible qualitative differences among Verbal IQ’s, dif- 
ferences which are further complicated by the con- 
tinued inclusion of the Arithmetic test as one of the 
measures of verbal ability. 

On page 13 of the Manual a word of caution is 
injected regarding the possible unreliability of dif 
ferences between scores on two subtests (same indi 
vidual), but when we reach p. 18 we learn that 
“differences as large as five points may be unusual 
enough to be noteworthy.” This statement, which is 
based on the statistical fact that the median value 
of the standard deviations of the 55 distributions of 
difference scores “was approximately three,” ignores 
two other statistical facts; namely, the standard 
deviations (none of which is reported) vary from 
about 1.9 to about 3.6 and the reliability of the dif- 
ference scores will vary considerably among the 
sets of difference scores. And some of the reliabili 
ties for difference scores will be disturbingly low (in 
the .50’s and .60’s). Since difference scores have 
and are advocated for use in clinical diagnosis, the 
failure of the author to include in the Manual reli 
ability coefficients and errors of measurement for di 
ference scores is absolutely inexcusable. 

In the opinion of the reviewer, the author of WAIS 
has attempted an impossible task: the construction 
of a scale to measure general (global) intelligence 
which at the same time will provide differences 
among subtests which are of diagnostic value. The 
eleven subtests are too diverse in content and yield 
intercorrelations too low to permit a satisfactory 
realization of the first aim, whereas they are not 
diverse enough (in factor-analytic sense) to yield 
the low intercorrelations necessary for reliable dif- 
ference scores. 


New Tests 


Buhler, Charlotte, & Mandeville, Kathryn. The Five 
Task Test (FTT). Ages 9-13. 1 form. (20) min 
Record booklet ($7.50 per 50); set of colored pads 
($4.00); Ball-and-field pad ($2.00); manual, pp 
24, mimeographed ($3.00); complete set ($15.00) 
Beverly Hills, Calif.: Western Psychological Serv- 
ices, 1955. 
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Buhler’s “five tasks” are related performance tests, 
which are reported to have some relationships to emo- 
tionality and organic brain damage. The first three 
tasks are to cut a circle, a heart, and a star from 
four-inch squares of paper. The qualities of the prod- 
ucts are rated against scaled specimens, and the num- 
ber of scraps is counted as a measure of “emotion- 
ality.” The fourth task is cutting “anything you wish” 
and ‘is given a projective interpretation. The fifth 
task, the Terman ball-and-field test, is rated as nor- 
mal, borderline, or problematic with the aid of scoring 
specimens. Like many other behavior samples, per- 
formance on these tasks may give a sensitive and 
alert psychologist many clues about a child’s typical 
responses to his world. The evidence supporting 
these particular tasks as clinical instruments is weak. 
No data on reliability are given; the very moderate 
reported validities were tested against inadequately 
defined criteria. Perhaps the test is worthy of fur- 
ther research; perhaps it will remain a subjective 
instrument of some value to clinicians who depend 
on observations and hunches rather than on scores.— 
L. Fd 


Buhler, Charlotte, & Howard, Gertrude. Personality 
Evaluation Form (PEF). 1 form. Booklet ($3.50 
per 25) with manual, pp. 11, mimeographed. Bev- 
erly Hills, Calif.: Western Psychological Services, 
1955. 

Described by its authors as “a technique for the 
organization and interpretation of personality data,” 
the Personality Evaluation Form is a 12-page book- 
let consisting mainly of a series of headings with 
blank spaces for entering data, inferences, and con- 
clusions. As an outline for case study, the form has 
some merit. Like all such blanks, however, it lacks 
flexibility. In attempting to meet the requirements 
of all situations and all clients, it inevitably pro- 
vides too little space for some topics and too much 
for other less relevant issues. Somewhat dangerous 
is the implication that it may be used by teachers 
with little psychological training. Well-trained cli- 
nicians hardly need such a blank; others should not 
attempt its use—L. F. S. 


Cooperative School and College Ability Tests 
(SCAT). College entrants and freshmen, 4 forms; 
grades 10-12, 2 forms. 70 (100) min. Test booklet 
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($3.25 per 25) with directions, pp. 14; hand-scor- 
ing answer sheets ($1.25 per 25), keys (35¢ set); 
IBM answer sheets ($1.25 per 25), keys (35¢ set) ; 
manual, pp. 57 ($1.00). Princeton, N. J., & Los 
Angeles, Calif.: Educational Testing Service, 1955. 


The new Cooperative tests are an impressive con- 
tribution to the development of a group-adminis- 
tered battery designed for the limited but important 
function of educational prediction. Each test con- 
tains four parts—sentence completion, vocabulary, 
numerical computation, and numerical problem solv- 
ing—selected for optimal concurrent validity from 
among nine types of items recommended by a panel 
of test specialists. Three scores are obtained, verbal, 
quantitative, and total, each expressed by a new type 
of three-digit scaled score and interpreted in terms of 
percentile norms. Currently available are two forms 
of Test 2 for use in the senior high school or with 
superior ninth-grade students, and four forms of 
Test 1, two of which have restricted distribution, 
for coliege entrants, freshmen, and sophomores. Fur- 
ther research is under way to produce additional 
levels, all with similar types of items, for use as far 
down the educational structure as the fourth or fifth 
grade. 

Although the manual is modestly labelled as pre- 
liminary, it contains impressive data about the de- 
velopment of the tests. It is equally emphatic about 
their limitations. Kuder-Richardson reliabilities are 
.93, .91, and .95 for the verbal, quantitative, and 
total scores. A short form for survey purposes, con- 
taining two subtests and requiring only 50 minutes, 
has a reliability of .91 and correlates .97 with the 
longer test. Validities are presented in terms of the 
concurrent correlations with marks in schoo) sub- 
jects; predictive validities are to be obtained from 
future studies. A valuable feature is the emphasis on 
the confidence intervals of scores and a convenient 
graphic device for showing that a score represents a 
probable range of ability, not an absolute point. 
The provisional norms are based on a wide sampling 
of about 9,000 students in high schools and about 
2,500 in colleges. 

Few psychometric devices have equaled the new 
SCAT in the sophistication of the test construction 
research, and in the scope and clarity with which 
the data are presented—L. F. S. 
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